Python's Summer of Code 2016 Updates

September 06, 2016

chrisittner
(pgmpy)

Feature summary of BN structure learning in python pgm libraries

This is a (possibly already outdated) summary of structure learning capabilities of existing Python libraries for general Bayesian networks.

libpgm.pgmlearner

  • Discrete MLE Parameter estimation
  • Discrete constraint-based Structure estimation
  • Linear Gaussian MLE Parameter estimation
  • Linear Gaussian constraint-based Structure estimation

Version 1.1, released 2012, Python 2

bnfinder (also here)

  • Discrete & Continuous score-based Structure estimation
    • scores: MDL/BIC (default), BDeu, K2
    • supports restriction to subset of data set, per node
    • supports restrictions of parents set, per node
    • allows to restrict the serach space (max number of parents)
    • search method??
  • Command line tool

Version 2, 2011-2014?, Python 2

pomegranate

  • Discrete MLE Parameter estimation
  • Can be used to estimate missing values in incomplete data sets prior to model parametrization

Version 0.4, 2016, Python 2, possibly Python 3

Further relevant libraries include PyMC, BayesPy, and the Python Bayes Network Toolbox. Also check out the bnlearn R package for more functionality.

by Chris Ittner at September 06, 2016 10:00 PM

August 22, 2016

Avishkar Gupta
(ScrapingHub)

Bad Design, Finalising & a Lookback

Hi, since this is the final one, and because I’m somebody who always has their Eureka moments in the nick of time, this is going to be a long one. But I think you’ve had enough of short and sweet from my side.

If you don’t like to read much, I suggest you turn back now.

So, as always, to re-iterate the goal of this project is a sweet and simple one: move signals away from PyDispatcher in Scrapy. As you know, the approach that we chose to follow to do the same was to go the Django way and use django.dispatch as a starting point.

For the last couple of weeks, y’all have been hearing about my fabled bench- marking suite. The reason why I never shared a link to the same in my blog posts was because I could never decide on whether the way I’m presenting those is right, but with deadline coming up, I finalised on the perf module that is used in Djangobench. Before we move any furhter, here’s a link to the benchmark suite and [here’s a sample output of those benchmarks]. Now that that’s out of the way, the rest of the post is going to concentrate on how we got there, and a problem that I encountered in the final weeks due to over-engineering on my part.

As you can see, at this moment the receiver_no_kwargs benchmark is on average about 1.5 times faster than how it previously worked. Now, let me tell you about a little something, robust_apply is a nuisance, so the original authors of django.dispatch had the brilliant idea of completely breaking backward compatability and getting rid of receivers that do not take variable keyword arguments. That however is not an option that I could follow. So being an amateur who hasn’t had to deal with breaking compatability upto this point, I decided to not introduce the same in scrapy.dispatch, rather I tried to work up some magic inside scrapy.signalmanger, specifically the following “hack”:

if receiver.__repr__() not in self._patched_receivers:
    self._patched_receivers[receiver.__repr__()] = \
            lambda sender, **kw: _robust_apply(receiver, sender, **kw)

Right, so as you can see, the flow here was crawler -> SignalManager -> Dispatch -> back to proxy in SignalManager. The method call overhead that was now introduced into the mix spelt disaster for this benchmark. Disaster. Not only that, sending receivers that were not proxied in this way was 0.5X slower than the raw dispatcher-dispatcher benchmark. Owing to this over engineered mess, I took on the task of writing this part again, with just a couple weeks of the coding period left to go. Due to the constant support that I’ve had from my mentor Jakob along the way, I’m relieved to say I was able to accomplish the same(which you already knew if you took the time to go through the benchmarks at the start :P).

Another important design decision this week was the one to deprecate scrapy. utils.signal. You’d think that is something that would be trivial since we’re moving to scrapy.dispatch but the original plan was for the methods in there to serve as pass through methods between scrapy.signalmanager and scrapy.dispatch.That however is no longer the case and our benchmark for receivers that accept kwargs shows that there is little to no overhead between that and raw signal performance.

So, with unit tests done, benchmarks done, optimizations done, it came down to the documentation. Now, I didn’t realize that all Python documentation ever is down using ReStructured Text. So I used up a couple of days to get the documen- tation done, however I’m pleased to tell you that even though I’ve not shared the same as of yet, it too is done.

However if you did look through the benchmarks you would have noticed that the connection time of signals to receivers has actually increased instead of decreasing. Well, most of that time is taken up in resolving whether a receiver accepts keyword arguments, and raising a deprecation warning. So even though it reads as being 2.5X slower than before, that time is actually negligible. Plus, in Scrapy unlike django, it’s not connecting to signals that’s the problem, it’s sending them multiple times.

Looking back, I would say at lest from where I stand the project was a success. I was able to learn a ton of new stuff, and make something cool. At the end of the day, that’s all that matters right? Also, I got to work with some wonderful people at Scrapinghub, specially my mentor Jakob, who put more thought into code review than I guess I put into writing it :), but then again, I digress. I would like to extend this longer, but I’ll be posting the three page document I’ll be submitting to Google on here too, so I guess you can read about it then. Until then, Rootavish signing off.

August 22, 2016 11:00 PM

Ravi Jain
(MyHDL)

The coding weeks are over!

So people, the coding weeks are over. This post is for a reference to the work done by me during this period highlighting the goals achieved and the outstanding work.

The task was to develop Gigaibit Ethernet Media Access Controller(GEMAC) (the MAC Sublayer) in accordance with IEEE 802.3 2005 standard using MyHDL. The aim was to test and help in the development of MyHDL 1.0dev, also demonstrating its use to the other developers.

In brief, work done includes developing Management Block and Core Blocks, i.e., Transmitter and Receiver Engine with Address Filter and Flow Control. Work left includes developing the interfacing blocks, i.e., FIFOs (Rx and Tx) and GMII.

Post Mid Term I started implementing core blocks. Midway I realised that I would be better off using Finite State Machines to implement these, which led me to rewriting the whole blocks. Currently, I am looking towards implementing the interfacing blocks; FIFOs (for which i shall try and use already developed blocks by other developers) and GMII(depends on the PHY, I will be using the one that comes on Zedboard, meaning i would be developing RGMII).

Tests for each blocks were developed using Pytest. Seperate tests were developed to test each unique feature and ensure its working. Also convertibility tests were developed to test the validilty of the converted verilog code which shall be used for hardware testing in the end.

Main Repo : https://github.com/ravijain056/GEMAC/

Links to PRs:
1.Implemented Modular Base: https://github.com/ravijain056/GEMAC/pull/1
2.Implemented Management Module: https://github.com/ravijain056/GEMAC/pull/4
3.Implemented Transmit Engine: https://github.com/ravijain056/GEMAC/pull/5
4.Implemented Receive Engine: https://github.com/ravijain056/GEMAC/pull/6

My main focus after I am done with this is to make this code approachable by other developers by providing various good examples of using the library.


by ravijain056 at August 22, 2016 10:37 PM

srivatsan_r
(MyHDL)

What did I do?

My final GSoC 2016 post!

What did I do?

HDMI Source/Sink Project (Project Repo Link):

  • Created Interfaces and Transactors to connect different modules and for giving inputs.
  • Created HDMI Transmitter Model and Receiver model.
  • Created both the HDMI transmitter and receiver cores.
  • Created tests for all the modules including the models and interfaces.
  • Integrated Travis-CI, landscape.io, coveralls.io to the github repo to give an automatic evaluation of the changes made to the project.
  • All modules were made convertible to Verilog.

Code Health                 Test Coverage

PRs were made after each significant addition to the project.

PR links :

The complete documentation for the project was hosted in readthedocs.

Documentation : link

RISC-V Processor (https://github.com/jck/riscv):

I assisted in developing the RISC-V processor by writing some tests and implementing some modules in MyHDL.

Wrote tests for

Created modules like

PR links : links

What’s pending?

I was assigned the task of making the RISC-V core generate video and transmit it via the HDMI transmitter.

My mentor told me to not do it and focus on creating the RISC-V processor core as the core was not completed and required two people to work on it parallely.

GSoC was fun and I gained a lot of knowledge!


by rsrivatsan at August 22, 2016 06:37 PM

Karan_Saxena
(italian mars society)

GSoC 2016 Submission


Concluding Google Summer of Code 2016 


Name: Karan Saxena


Project Name: IMS V-ERAS: Improving the Step Recognition Algorithm


Working on: Python, PyTango, Kinect V2 sensor by Microsoft














Student Application: http://www.ims-gsoc.org/#improve-step-recognition


Project Mentors:

Ambar Mehrotra: https://bitbucket.org/mehrotraambar/
Antonio Del Mastro: https://bitbucket.org/aldebran/

Commit List: https://github.com/mars-planet/mars_city/commits?author=karansaxena


Project Work:


Description:
Virtual ERAS (V-ERAS) forms a salient part of European Mars Analog Station (ERAS) for Italian Mars Society (IMS).

The immersive VR Simulation of V-ERAS allows users to interact with a simulated Martian environment using Aldebran VSS Motivity, Oculus Rift and Microsoft Kinect.


Motivity is a passive omnidirectional treadmill, so user’s steps are not in real dimensions. Therefore, the configuration needs to include an accurate and robust algorithm for estimating user’s steps, to be reproduced in the V-ERAS.

In the V-ERAS station simulation, the data used for the Step Recognition Algorithm with the Microsoft Kinect are the skeletal joints. They were recognized by means of the Skeletal Tracking implemented in Microsoft Kinect SDK (1.8).



However, there were two main issues with the previous settings, and the goal of my project was to rectify and overcome them:

  1. Enhance the accuracy of the former step recognition algorithm: As of then, it was uncertain if that was totally or partially due to a noisy recognition of feet skeletal joints, or if that was due to the non-optimal algorithm only. That needed to be investigated and rectified.
  1. Improved the recognition of feet joints: In the former configuration (using Kinect 1.8), the feet joints weren’t recognised precisely.


Deliverables achieved:


  • Setting up of Kinect v2.
  • Setting up PyTango with Kinect to integrate with the ERAS Environment.
  • Capturing Body Frame (1920x1080) and Depth Frame and (512x424).
  • Attaching corresponding Body Frame and Depth Frame by corresponding area overlay
  • Calculating the distance moved (in mts) in real-time by a skeleton in the frame.
  • Testing PyKinect2 and Tango Setup.
  • Optimizing the output by tweaking certain parameters and changing input conditions (eg. by wearing colored shoes).

Challenging Bits:


  • Setting up PyKinect2 was really difficult. Had to tweak the code a bit. Hence, supplied the custom library.
  • Depth Frame was out of order with the Body Frame. 
  • Multi-threading support to simultaneously publish the values to the db/client.

Future Scope: 


  • Improve multi-threading for seamless updates.
  • Final integration in the production environment.

Acknowledgements:

I would like to thank:

Contact: LinkedIn

_________________________________________________________________________________

by Karan Saxena (noreply@blogger.com) at August 22, 2016 06:04 PM

Shridhar Mishra
(italian mars society)

Conclusion





Conclusion
Name: Shridhar Mishra

Sub org: Italian mars society.






Project:
Integration of Unity Game scene with the existing pyKinect and emulate a moving skeleton based of the movements tracked by the Kinect sensors.


Another year of GSoC is coming to an end and most of the proposed work has been complete.

List of commits:

https://github.com/mars-planet/mars_city/commits?author=shridharmishra4

Project description:

https://github.com/mars-planet/mars_city/tree/master/servers/unity

Things done:
  • Setting up kinect 2 with ERAS environment.
  • Data extraction from kinect 2 which includes skeleton coordinates, infrared, and video output from kinect camera using python.
  • Setting up Kinect with Tango Controls and sending skeleton data using specialised numpy array.
  • Unit testing of Pykinect2 and Tango.
  • Unit testing of Unity game engine related code.
    Challenging bits:
    • Integration of Kinect interface and the architecture less communication using tango.
    • Selection of data that is supported by Tango and which is feasible for constant polling process which least possible latency between the transmission.
    • Setting up Tango on Windows can also be an uphill task :-P.. follow this for hassle free installation
    Future enhancements:
    • Multi threading has to be improved so that tango can run simultaneously in a daemon thread efficiently.
    • Proper Integration of Tango and Unity.
    • Add test cases.
    • Test skeleton co-ordinates received at the linux end.
    • Final integration testing and deployment.
    References:






















    by Shridhar Mishra (noreply@blogger.com) at August 22, 2016 01:19 PM

    Vikram Raigur
    (MyHDL)

    GSoC Final Report

    Completed work :

    1. Quantizer module.
    2. Run Length Encoder  Module.
    3. Huffman Module.
    4. Byte-Stuffer Module.
    5. Back-end Main module.
    6. Back-end module compatible with front-end.
    7. Input Buffer that can store three blocks form front-end.
    8. Completed software prototype for back-end module.

     

    To-Do :

    1. Connecting front-end and back-end.
    2. Commit new back-end with new interfaces (front-end compatible).

    The back-end module which is committed into the main repository need a couple changes (adding a counter inside). I have done adding counter but did not upload it yet (testing in progress).

    I made some changes to quantizer module at the end. So, a bit code clean up is needed in test-bench.

    The only main task left now is connecting front-end and back-end modules.

    All the modules are scalar, convertible and modular. I also write software prototypes for each module, so that we can check and compare outputs from the test-bench and the software prototype. All the test-bench’s are convertible.

    The documentation for the overall project is this link. Just enter the command make html. you will get the HTML version.

    The read-the-docs version (online version) of the project documentation is here.

    The above documentation covers about the kind of work each module does, interfaces etc. Also, it covers the coverage results of each module.

    Github Links:

    The code coverage for the modules can be seen in the documentation.

    In near future, I will try to investigate Dynamic Huffman Encoding in the back-end.


    by vikram9866 at August 22, 2016 07:13 AM

    Pranjal Agrawal
    (MyHDL)

    GSoC final summary and development

    Hey guys,

    This is my final post in the GSoC series of posts for the myhdl version of  Leros Tiny Processor. Here I will describe the work done, what were some of the challenges I faces, the work still remaining, and my future plans for this project.

    My GSoC project was to redesign the Leros tiny processor in myhdl,  convert and synthesize,  test it on hardware,  and develop a command bridge assembly for it to be interfaced with other rhea designs.  
    The entire project repository can be found at:


    pyLeros Development Summary 


    In my GSoC proposal, I had outlined 5 major goals for the development of leros:
    1. Writing the tools like simulator, assembler. 
    2. Development of the processor and test suite.
    3. Code Refactoring and conversion to myhdl
    4. Synthesis and harware testing with examples.
    5. Writing of the UART + Command bridge on Leros, the real world application.

    I'm happy to tell you that my goals 1, 2, and 3, are all over and done with. The various python based modules have been written and well tested(coveralls indicate that the test coverage is currently around 88%) . I wrote a simulator in python for the processor, which has also been used for the tests. This is a good idea because processor simulators are much easier to write in software than the processors themselves, and this increases test coverage. Many of the bugs were caught this way. Many challenges were also faced, like hazards, and getting the timing down just right.

    That part(points 1, 2) was done in the first 7 weeks. Unfortunately,  it took more time that originally anticipated because the slightest of bugs in the description can cause the program to run awol. Also, debugging is hard because you actually have to go to the executing step by step, looking at the execution of each instruction.

    Then I started the code refactoring and conversion phase. This is important because lots of things that can be written in software and run perfectly on simulation can either not convert to VHDL or not synthesize. For example, the decoder signals, which go from the decoder to the various components and cause the execution, were originally a list of signals. Since this can't be converted, I have hence used interfaces. Other things were also changes, that were giving either no or poor results in VHDL.

    Synthesis and testing

    Finally synthesis and testing part, which was further refining the structure of myhdl so that the converted VHDL can be synthesized, and is semantically correct too( This is important, so that resources are not mis-assigned while synthesis). There is actually still some bugs in this, for example, the synthesizer keeps assigning a couple thousand registers instead of on board memory for the RAM.

    Examples and hardware testing

    Next came the examples. Because I felt like I was repeating almost exactly martin had done with his assembler in Java, I scraped my half-written python assembler and ported over the java one in a day. Now I had examples that I wrote, some of the common tasks that can be used to test processors, for example sorting algorithms, etc. These were written, assembled, tested on simulation, converted to VHDL, tested on VHDL simulation,  and finally tested on the FPGA. And they work!

    External Design of the processor


    The main design of the processor goes as follows: We write and assembly file, and assemble it. While instantiating the processor in our designs, we pass this file as arg to the main pyleros @myhdl.block, and also connect the 16- bit I/O ports and 1-bit strobes to the appropriate places. And the working is on. Such a design,a long with the I/O also have tests in the test suite. This was done because the processor is supposed to work as a general purpose peripheral, so the memories have been included and are not exposed outside. This design can be modified as needed.

    Git development flow.

    The git development workflow I followed was something like this: A little of the initial development was done was in the branch. Then I moved the development to the branch core. After the mid terms, the a PR was given from the core branch to the master, and the development continued in dev-exp. dev-stg2 contains the code for simulation and some refactoring. Branch conversion contains the conversion refactoring, tests with the assembled code, and generated vhdl, with the synthesis. The branched have subsequently branched from the previous one after giving a PR to the one before that. The branches have been merged to main after successfull testing phase, because of time constraints.

    The PR's can be viewed here:
    https://github.com/forumulator/pyLeros/pulls

    Future plans

    Unfortunately, one of the goals outlined, creating of the command bridge on pyleros, I wasn't able to complete, and has been postponed to after GSoC. However, I have a clear view in mind of what has to be done, and I expect to finish it off in the next couple of weeks. That way, I can also test the processor with existing rhea cores. Beyond that, the part remaining is writing more examples, trying to increase coverage to 100%, the works. In short, the majority (over 95%) of the project has been done.

    And that brings us to the end of this long post. It has been a long and eventful summer with many ups and downs along the way. But after all this, the project is finally done, and I can tell anyone who asks that summer 2016 was a summer well spent!






    by Pranjal Agrawal (noreply@blogger.com) at August 22, 2016 03:00 AM

    August 21, 2016

    sahmed95
    (dipy)

    Wrapping up : Google Summer of Code 2016

    Software development is challenging and developing a robust package for use in the real world is a completely different experience than writing code for yourself. Google Summer of Code 2016 was a valueable learning experience which introduced me to Free and Open Source Software. I learnt the importance of writing robust tested code which is easy to read and understand for other developers. GSoC also gave me an opportunity to interact with experienced mentors who were very patient in answering my doubts and suggesting good coding practices. With thorough code reviews and regular hangout sessions, my mentors went out of their way to help me with my project. The entire process of writing a proposal, getting involved with a new community, developing a package which will be used by a large number of users, testing the code and writing examples for it has given me the confidence to take up any project and work independently. I am sure this is the first of many more projects and contributions that I will get involved with in the FOSS world.

    One of the most interesting part of GSoC for me was to get to know and work with people from different parts of the world. This introduced me to a completely new way of working in a team remotely and I am now looking to get involved in more such projects.

    As someone who had very little previous experience in software developement one of the most difficut task was getting selected for the project but I am very grateful that I got a chance. I already had some experience with Mathematical Modelling and as a pre-final year student in Physics this project has helped me explore the topic thorougly. I am sure this will be a great help in deciding my final year thesis and whenever I develop code from now on, I will have the mindset of an open source developer and try to write code such that it can be used freely and developed further.

    by Shahnawaz Ahmed (noreply@blogger.com) at August 21, 2016 10:16 PM

    chrisittner
    (pgmpy)

    GSoC 2016 Work Product

    My GSoC 2016 work can be found here:

    In addition, these two Pull Requests are not yet merged:

    I wrote an introduction to pgmpy’s new structure learning features here, as a Jupyter Notebook:

    by Chris Ittner at August 21, 2016 10:00 PM

    SanketDG
    (coala)

    That's it, folks!

    So this is it. The end of my Google Summer of Code. An amazing 12 weeks of working on a real project with deadlines and milestones.

    Thanks, awesome mentor!

    First and foremost, I would like to thank my mentor Mischa Krüger for his constant guidance and support through the tenure of my project.

    Thank you for clarifying my trivial issues that were way too trivial. Thank you for clearing my doubts on the design of the classes. Thank you for writing a basic layout for a prototype bear. Thank you for understanding when I was not able to meet certain deadlines. Thank you Mischa for being an awesome mentor.

    The Beginning

    I was first introduced to coala in HackerEarth IndiaHacks Open Source Hackathon. I wanted to participate in it, so I took a look at the list of projects and saw coala. I jumped on their gitter channel and said hi. Lasse hit me back instantly, introduced me to the project, asked me to choose any newcomer issue, and my first patch got accepted in no time.

    As the hackathon came to an end, it was time for organisations to start thinking about Google Summer of Code. By then, I had been taking part in regular discussions, and code reviews, Lasse asked me if I’d like to do a GSoC:

    I slowly pivoted to choosing language independent documentation extraction as my GSoC project as I found it having greater depth than my other choices.

    I feel privileged to be contributing to coala. The project itself is awesome in its entirety. I have contributed to my fair share of open source projects and I have never found any other project that is so organized and newcomer friendly. How coala is awesome should be itself another post.

    About my project

    Now to my project. As stated repeatedly in my past posts, my project was to build a language independent documentation extraction and parsing library, and use it to develop bears (static analyzing routines.)

    How it all fits together

    Most of the documentation extraction routines were already written by my mentor. Except a couple of tiny bugs, it worked pretty well. The documentation extraction API was responsible for extracting the documentation given the language, docstyle and markers and return a DocumentationComment object.

    The DocumentationComment class defines one documentation comment along with its language, docstyle, markers, indentation and range.

    My first task was to write a language independent parsing routine that would extract metadata out of a documentation i.e. description, parameter and return information. This resides inside the DocumentationComment class.

    The point of this parsing library is to allow bear developers manipulate metadata without worrying about destroying the format.

    I then had to make sure that I had support for the most popular languages. I used the unofficial coalang specification to define keywords and symbols that are used in different documentation comments. They are being loaded along with the docstyle.

    Although I do not use the coalang stuff yet and still pass keywords and symbols manually, it will be used in future.

    Lastly, I had to implement a function to assemble a parsed documentation into a documentation comment.

    I separated this functionality into two functions:

    • The first function would take in a list of parsed documentation comment metadata and construct a DocumentationComment object from that. The object would contain the assembled documentation comment and its other characteristics. Note that this just assembles the inside of the documentation comment, not accounting for the indentation and markers.

    • The second function takes this DocumentationComment object and assembles it into a documentation comment, as it should be, taking account of the indentation and the markers.

    Difficulties faced

    • The first difficulty I faced was the design of the parsing module itself. With the help of my mentor, I was able to sort that out. We decided on using namedtuples for each of the metadata:
    Parameter = namedtuple('Parameter', 'name, desc')
    ReturnValue = namedtuple('ReturnValue', 'desc')
    Description = namedtuple('Description', 'desc')
    
    • If I wanted to make the library completely language independent, most settings would have to be configurable to the end user. Initially I hardcoded the keywords and symbols that I used, but later the coalang specification was used to define the settings. They are yet to be used in the library.

    • While trying to use the above mentioned settings, I realized that the settings extraction didn’t work for trailing spaces. Since I had to have settings with trailing whitespace, I had to fix the extraction in the LineParser class.

    What has been done till now

    coala

    56e1802 DocumentationComment: Add language, docstyle param
    72b6c9c DocumentationComment: Add indent param
    bc4d7d0 DocumentationComment: Parse python docstrings
    337b7c1 DocumentationComment: Parse python doxygen docs
    99fa059 DocumentationCommentTest: Refactor
    fc2e3bf DocumentationComment: Add JavaDoc parsing
    12ede4f ConsoleInteraction: Fix empty line tab display
    07135f5 DocumentationExtraction: Fix newline parsing
    5df5932 DocumentationComment: Fix python parsing
    f731ee4 DocumentationComment: Remove redundant code
    e442dce TestUtils: Create load_testdata for loading docs
    7de9aed LineParser: Fix stripping for escaped whitespace
    31b0410 DocstyleDefinition: Add metadata param
    edc67aa DocumentationExtraction: Conform to pep8
    3a78aa9 DocumentationComment: Use DocstyleDefinition
    dc35a0a DocumentationComment: Add from_metadata()
    78ff315 DocumentationComment: Add assemble()
    3c239d7 setup: Package coalang files

    What lies ahead

    The API still has a long way to go. A lot of things can be added/improved:

    • Maybe the use of namedtuples is not that efficient. I think classes should be used and subclassed from these namedtuples. This will allow the API to be way more flexible than it currently is, and also retaining the advantages with using namedtuple.

    • A cornercase in assembling #2645

    • Range is not being calculated correctly. #2646

    • The API is not using the coalang symbols/keywords. #2629

    • A lot of things are just assumed from the documentation while parsing. Related: #2143

    • Trivial: #2617, #2616

    • A lot of documentation related bears can be developed from this API.

    It has been an awesome 3 months and an even more awesome 7 months of contributing to coala. That’s it folks!

    Other projects.

    Also, I want to talk about the projects of other students:

    • @hypothesist did an awesome job on coala-quickstart. The time saved in using coala-quickstart vs. writing your own .coafile is huge and this will lead to more projects using coala. He has also worked on caching files to speed up coala.

    • @tushar-rishav built coala-html! Its a web app for showing your coala results. He has also been working on a new website for coala.

    • @mr-karan did some cool documentation for the bears and implemented syntax highlighting in the terminal.

    • @Adrianzatreanu worked on the Requirements API.

    • @Redridge’s work on External Bears will help you write bears in your favourite programming language.

    • @abhsag24 worked on the coalang specification. We can finally integrate language independent bears seamlessly!

    • Thanks to @arafsheikh, you can now use coala in Eclipse.

    August 21, 2016 05:30 PM

    tushar-rishav
    (coala)

    GSoC'16 final report

    GSoC

    Alright folks. It’s officially an end to the amazing 12 weeks of Google Summer of Code 2016! These 12 weeks have certainly helped me become a better person, both personally and professionally. I’ve had a chance to interact and learn from some very interesting and amazing minds from coala. Sadly, I couldn’t meet all but a few of them during the EuroPython conference this summer.

    Acknowledgement

    First and foremost, I would like to pay my eternal gratitude to my mentor Attila Tovt who patiently helped me improve the patch through multiple reviews. I was learning something new with each iteration. The major take away for me from his mentorship would be:

    There is always a hack to something, then there is a solution to it.

    Sadly, I couldn’t meet him during EuroPython conference, but I am hopeful to pay my regards in person someday! :)

    Next, I must thank Abdeali for helping and guiding when I started my journey with coala and also Lasse Schuirmann. Well after all he is coala’s BDFL! Wondering what that means? Benevolent Dictator for Life :D

    Hmm, on a serious note, if I say I am going to continue contributing to coala or any other FOSS that I may come across in future, one of the major reasons and an influencer would be Lasse! The guy is totally amazing. I don’t think anything else could describe him better than his own words that he said during EuroPython’16

    Guys! Don’t just be a participant. It’s boring! Be and create a conference!

    Impressive isn’t it? That’s Lasse for you! :)

    I would also thank fellow GSoC students and now my new friends - Adrian, Adhithya, Alex, Karan, Araf, Sanket and Abhay. I look forward to stay in touch with them even after GSoC! :)

    Last but far from the least, thanks to Max for giving such a wonderful lightning talk at EuroPython’16, cooking for us with all love at Bilbao, for being such an amazing and wonderful person.

    The acknowledgement must end with my gratitude to coala, Google and Python Software Foundation for giving me an opportunity to be a GSoC student in the first place.

    Work history

    Past summer, I’ve contributed and maintained coala-html and coala-website projects. The commits and live demos are available online.

    Table 1
    Project Commits Status
    coala-html demo coala-html commits, total 49 commits completed
    coala website repo and coala embedded in coala-website demo Mentioned in Table 2 It is almost complete, but requires improvement in design.
    Table 2

    Commits for coala website repository as Gitlab doesn’t support commit filtering by author yet.

    Commit SHA Shortlog
    244a0b Add coala config, .gitignore and README
    f9242a runserver.py: Init setup with flask
    36f064 server: Add editor and preloader
    51ac6e bower.json: Use bower to install dependencies
    ac3cdd beardoc: Display what coala can do interactively
    dd4f27 layout.html: Add donation, about and sponsor
    1b1efa index.html: Add gitter chat
    f90513 bear-doc: Include bear-doc
    597a2a route.py: Add contributors
    9a3491 requirements.txt: Add python dependencies
    432f4f README.md: Add Installation instructions
    220020 sitemap: Add sitemap_template
    bd31c1 editor.html: Add editor

    Although GSoC period may end, my contributions to FOSS won’t! :)

    Regards,
    Tushar

    August 21, 2016 02:58 PM

    John Detlefs
    (MDAnalysis)

    My Summer of Code Experience

    The summer is over and I am very proud to report that I have accomplished most of my goals I set before starting work with MDAnalysis.

    A TL; DR of my github commits to MDAnalysis — — April (Before GSoC officially started) — - Fixed a bug in rmsd calculation code to give the right value when given weights. - Eliminated a dependency on the BioPython PDB reader and updated documentation. The ‘primitive’ PDB reader replaced BioPython - Added an array indexed at zero for the data returned by hydrogen bond analysis

    May (Official work started on the 23rd.)

    • I refactored most of the code in MDAnalysis/analysis/align.py to fit our new analysis api. I improved documentation where I could and wrote a new class called AlignTraj that fits the new API.

    June

    • I was reading and writing a lot about dimension reduction, and during this time I wrote the Diffusion Maps module MDAnalysis/analysis/diffusionmap.py

    July

    • I finished diffusion maps, and significantly increased coverage for the RMSD class in the analysis package.
    • I went to SciPy!
    • Started work on PCA

    August

    • I fixed some problems in the DCD reader which involved dealing with C code.
    • I finished the PCA module
    • I started (and am very close to finishing) work on a refactor for RMSD calculation to align with the new API.

    Times like this call for reflection and honesty, I feel as though there were good moments, some bad moments, but for the most part I feel as though I did a perfectly satisfactory performance. If I were to give myself a grade I’d give myself a B. It may be one of those B’s where my raw grade was a B- but I clearly worked hard and know more at the end than when I started, so I feel as though I’ve earned the bump.

    Why not an A, you ask?


    There were some weeks over the summer where I really felt like I wasn’t reviewing and iterating on my own work in an incisive fashion. It is easy to write code, commit some changes, and then wait for your mentor to make comments on what you should fix. Around week five or six, I could have done a better job at imagining I was my mentor and predicting his suggestions.

    In addition, there were some times in which I self-assigned bug fixes and projects that turned out to be far harder than I anticipated. (One of them was a reader rewind issue that required more understanding of the Reader API than I currently have.) Having respect for the complexity of code and the amount of time it takes to go in and dissect a problem in order to figure out how to fix it is something that I hope to keep improving.

    What I did well


    • I worked hard on setting a realistic timeline at the beginning of the summer and achieved all but the last item on the list.
    • I communicated frequently and did not hesitate to ask questions that came up.
    • I helped discover and fix some bugs in old C code which required some self- teaching.
    • Read tech papers, taught myself some dimension reduction algorithms such that I could confidently justify code I wrote.

    Historically I think I’ve overpromised and underdelivered on projects, but in this case I think I did a decent job of delivering on the work I promised and doing more when I could. I never got to go in and try to figure out parallel trajectory analysis, but I am still optimistic that I can find the time to work on this soon.

    What have I learned?


    • It is often easier and more productive to write crappy code and iterate on it rather than trying to sit at your computer and come up with the perfect code on your first try.

    • Weekends are an important break to prevent long term mental fatigue.

    • Motivation can come and go in waves, ride the wave. Don’t get bummed out and hard on yourself when you find yourself in a lull.

    • IRC or Slack or Gitter is really great to have an informal avenue for questions. Mailing lists and email is overly formal for the kinds of questions I want to ask as a noobie programmer.

    • Don’t read into things, people get busy, if someone doesn’t respond to an email you probably haven’t done anything wrong.

    • The only way to solve hard problems is to break them down into discrete chunks. This is a skill that I have yet to master but am working on improving.

    • I approach health in a very rigorous and number oriented way, I have been thinking that I could treat reading and self-improvement as I do exercise I could do a better job at being productive.

    • A significant portion of working on large software projects is communicating technical ideas rather than purely writing code. Being able to write prose clearly for all audiences is a skill I am still working to improve.

    August 21, 2016 12:00 AM

    August 20, 2016

    tushar-rishav
    (coala)

    Mutable default arguments in Python

    Recently, I came across an interesting feature in Python and thought I should share it with everyone.

    Suppose we have a code snippet that looks like:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    def func(default_immutable_arg="Hello",
    default_mutable_arg={}):
    key = 'some_key'
    default_immutable_arg += "!"
    print(default_immutable_arg),
    if default_mutable_arg.get(key, None):
    print("{key} exists".format(key=key))
    else:
    print("{key} doesn't exist".format(key=key))
    default_mutable_arg[key] = "some_value"
    for i in range(3):
    func()

    Before reading further, please stop and go through the code snippet carefully.

    Now, What output do you expect from the above snippet? There is a high probability (unless you are a Python wizard!) that you might expect the output to be

    1
    2
    3
    Hello! some_key doesn't exist
    Hello! some_key doesn't exist
    Hello! some_key doesn't exist

    Well, if that’s what you expected, then you are wrong!

    The correct output is

    1
    2
    3
    Hello! some_key doesn't exist
    Hello! some_key exists
    Hello! some_key exists

    Interesting isn’t it? Let’s dig a little deeper and find out why this happens.

    Perhaps, it’s quite known now.

    Expressions in default arguments are calculated when the function is defined, not when it’s called.

    Before I explain further, let’s verify the above statement quickly:

    1
    2
    3
    4
    5
    import time
    def time_func(arg=time.time()):
    return arg
    print [time_func() for _ in range(3)]

    For me the output looks like

    1
    2
    [1471723451.85, 1471723451.85, 1471723451.85]
    # Notice the exact same timestamps here.

    Clearly, arg was calculated when time_func was defined and not when it’s called otherwise you would expect arg to be different each time it’s executed.

    Coming to our func example. When def statement is executed a new function object is created, bound to the name func and stored in the namespace of the module. Within the function object, for each argument object with a default value, an object is created that holds the default value.

    In the above example, a string object (“Hello!”) and an empty dictionary object is created as a default for default_immutable_arg and default_mutable_arg respectively.

    Now, whenever func is called without arguments, the arguments will use the values from their default bounded object i.e default_immutable_arg will always be “Hello!” but default_mutable_arg may change. It’s because of the fact that string objects are immutable whereas a dictionary objects are mutable. So whenever in line 4, we append “!” to default_immutable_arg, a new string object is created and returned which is then printed in next line, keeping default string object’s value still intact.

    This isn’t the case with mutable dictionary objects. The first time we execute func without any arguments, default_mutable_arg takes its value from default dictionary object which is {} now. Hence, the else block will be executed. Since the dictionary objects are mutable, the else block changes the default dictionary object. So in the next execution of the function, when default_mutable_arg reads from default dictionary object, it receives {'some_key':'some_value'} and not {}. Interesting huh? Well that’s the explanation! :)

    Solution

    Don’t use the mutable argument objects as default arguments! Simple! So how do we improve our func ? Well, just use None as default argument value and check for None inside function body to determine if the arguments were passed or not.

    1
    2
    3
    4
    5
    def func(default_immutable_arg="Hello",
    default_mutable_arg=None):
    default_mutable_arg = ({} if default_mutable_arg is None
    else default_mutable_arg)
    # rest is same..

    Now, you can imagine what would happen if we overlook this feature in defining class methods. Clearly, all the class instances will share the same references to the same object which isn’t something you’d want to have in the first place! :)

    I hope that was fun,

    Cheers!

    August 20, 2016 08:33 PM

    Aron Barreira Bordin
    (ScrapingHub)

    Scrapy-Streaming - Support for Spiders in Other Programming Languages

    This page describes the Scrapy Streaming project, the summer goals, submitted patches, and a simple overview of the results.

    About this project - Abstract

    Scrapy is one of the most popular web crawling and web scraping framework. It’s written in Python and known by its good performance, simplicity, and powerful API. However, it’s only possible to write Scrapy’s Spiders using the Python Language. The goal of this project is to provide an interface that allows developers to write spiders using any programming language, using json objects to make requests, parse web contents, get data, and more. Also, a helper library will be available for Java, JS, and R.

    Scrapy Streaming

    This project was named Scrapy Streaming, and it’s a Scrapy’s extension that provides an json layer to develop spiders using any language.

    You can download and play with it in the scrapy-plugins/scrapy-streaming repository.

    Development

    In this section, you can read an overview of the development progress and the submitted pull request related to each topic.

    Documentation

    I started the Scrapy Streaming development defining the project API. writing the Communication Channel spec and writing about its usage.

    I considered important to start with the project docs because it gave me some time to share my API proposal and discuss with the Scrapy developers and contributors to get some feedback, even before starting with the development.

    Pull request: https://github.com/scrapy-plugins/scrapy-streaming/pull/7

    Documentation url: http://gsoc2016.readthedocs.io/

    Travis and Unit Tests

    Before coding, I defined the project and source code structure, adding travis, configuring codecov to check the test coverage, and implemented some initial tests.

    To outcome a good project, codecov would help me to ensure to test everything under development.

    Pull request: https://github.com/scrapy-plugins/scrapy-streaming/pull/2

    Scrapy Commands

    Scrapy has a command line interface.

    The Scrapy Streaming adds the streaming command to the CLI. The streaming command allows you to execute scripts/executables as a Spider, using the following command:

    scrapy streaming my_executable -a arg1 -a arg2 -a arg3,arg4
    

    For example, to run a R spider named sample.R, you can use the following command:

    scrapy streaming Rscript -a sample.R
    

    If you are using a scrapy project, you can also add multiple spiders. To do this, you must add a file named external.json in the project root similar to the following example:

    [
        {
          "name": "java_spider",
          "command": "java",
          "args": ["/home/user/MySpider"]
        },
        {
          "name": "compiled_spider",
          "command": "/home/user/my_executable"
        }
    ]

    and then run it using the crawl command. For example:

    scrapy crawl java_spider
    

    This definition lets you implement multiple spiders using multiple programming languages and organize them in the same project.

    Pull request: https://github.com/scrapy-plugins/scrapy-streaming/pull/3

    Communication Protocol

    The Communication Protocol is a json API the lets any process to communicate with Scrapy, letting you develop spiders with any programming language.

    This section was a bit challenging because I needed to implement a good code that supports stdin/stdout buffering, preventing user mistakes, system errors, processing performance, scrapy problems, and spider problems.

    This is the core of the project, the most important patch. So I added a lot of unit tests to ensure it’s working well in both Python2 and Python3.

    Pull request: https://github.com/scrapy-plugins/scrapy-streaming/pull/5

    Examples

    Something very important in this project is spider examples. Future users may depend on initial examples to be able to learn and use Scrapy Streaming.

    I added examples in Python, R, Node.js, and Java, describing its basic usage. Also, the documentation contains a Quickstart section that provids a brief explanation about how to use it in each programming language.

    Pull request: https://github.com/scrapy-plugins/scrapy-streaming/pull/4

    R package

    To help the development of spiders using the R language, I implement a R package to wrap the communication channel and help developers. It contains unit tests, documentation, examples, and it’s developed using the standard R package structure.

    Pull request: https://github.com/scrapy-plugins/scrapy-streaming/pull/8

    Java library

    To help the development of spiders using the Java language, I implement a Java library to wrap the communication channel and help developers. It contains unit tests, documentation, examples, and it’s developed using the standard Java library structure with Maven.

    Pull request: https://github.com/scrapy-plugins/scrapy-streaming/pull/9

    Node.js package

    To help the development of spiders using the Javascript (with Node.js) language, I implement a Node package to wrap the communication channel and help developers. It contains unit tests, documentation, examples, and it’s developed using the standard Node.js package structure.

    Pull request: https://github.com/scrapy-plugins/scrapy-streaming/pull/10

    Selectors

    My initial proposal had some features called “Selectors”. With selectors, you could extract data from the HTML response using css/xpath filters in the communication protocol. This part was removed from the proposal, because this could add more complexity to the project.

    I implemented a proof of concept, and I’d be happy to implement it after the summer because I consider this a very important feature to Scrapy Streaming.

    Pull request: https://github.com/scrapy-plugins/scrapy-streaming/pull/11

    WIP

    Here, I highlight some topics that is not yet done.

    • Publish the documentation in a official page (currently I’m using my personal readthedocs page to store the docs.)
    • Publish Scrapy Streaming to PIP
    • Publish helper packages to the following repositories (the code is ready to be published):
      • Java -> Maven Central Repository
      • R -> CRAN
      • Node.js -> NPM
    • Some pull requests requires reviews.

    Final Considerations

    This summer was very good for me. I was able to develop the project and I achieved my goals with this project. Now, it’s possible and easy to implement spiders using your preferred programming language.

    I received an awesome support and I’d like to thank my mentors (@eLRuLL and @ptremberth), and another GSoC student (@Preetwinder) for all the help in this project. We’ve discussed a lot about its usage, best practices, and coding.

    I’m a big fan of the Open Source community, so I’ll be happy to continue contributing with the Scrapy Streaming repository, and some related project after this summer.

    Thank you !!,

    Aron Bordin.

    Scrapy-Streaming - Support for Spiders in Other Programming Languages was originally published by Aron Bordin at GSoC 2016 on August 20, 2016.

    by Aron Bordin (aron.bordin@gmail.com) at August 20, 2016 06:27 PM

    mike1808
    (ScrapingHub)

    GSOC 2016 #6: Final Submission

    Last weeks

    During the last weeks I was actively fixing bugs, adding small features, refactoring the existing code and writing docs.

    I’ve refactored my event listeners approach to support element.node:addEventListener() and element.node:removeEventListener(). Now, you can add more than one event listener for the specified event, which, I think, is great.

    Also, I’ve refactored the HTML Element creation process. Now, HTML elements are created in the tab#evaljs method which is the main method for evaluating JavaScript code across the application. Because of that you can do something like:

    local div = splash:evaljs('document.createElement("div")')
    div.node.id = 'myDiv'
    div.node.style.background = '#ff00ff'
    splash.select('body').node:appendChild(div)
    

    And, finally, documentation. I wrote about 800 lines of documentation and now my PR has almost 4,000 line changes. I understand how hard is to review it and I appreciate how much time the reviewer should spend on it.

    Final Submission

    During this summer I was working on the following pull requests for Splash:

    splash:with_timeout()

    • Pull Request: srapinghub/splash#465
    • Status: fully implemented and merged.
    • Abstract: allows to set timeout to any Lua function. It will let the executable function to run only (sometimes more) the specified amount of time.
    • Blog post: post
    • Comments: Implementing this helped me to deeply understand how Splash scripts work and, particularly, how is the scripts event loop is implemented.

    DOM elements: take 1

    • Pull Request: srapinghub/splash#471
    • Status: switched to another API interface, closed.
    • Abstract: allows to manipulate with DOM elements in Lua scripts.
    • Blog post: post
    • Comments: I have created a branch of API that allows to work DOM elements in the following manner:
    local form = splash:select('form')
    local input = splash:select('form > input[name="name"]')
    local ok, name = assert(form:node_property('value'))
    assert(input:field_value('mr. ' .. name))
    local ok, submit = assert(form:node_method('submit'))
    submit()
    

    As you can see you should write pretty much code for (in this case) getting the form values, changing it and then submitting the form. And it was the main reason to change the API interface.

    Bug #1

    • Pull Request: srapinghub/splash#482
    • Status: fixed and merged.
    • Abstract: In Lua all methods of all tables of the particular class was bound to one Python objects.
    • Blog post: post
    • Comments: This bug was pretty significant for my further work as it wouldn’t allow to work with multiple elements in the same time:
    local el1 = splash:select('#el1') 
    local el2 = splash:select('#el2') 
    
    el1:form_values() -- calls el2:form_values()
    

    I’ve changed the followings things: 1. Getters and Setters of the table (that are assinged from Python) now are assigned to the table itself rather than their was in closure scope. 2. Changed the __index and __newindex metamethods so that they are assigned to the table itself rather than to its metatable. 3. Made tables’ metatable as a metatable of itself.

    Overall during fixing this bug I’ve learned many things about how OOP and metamethods works in Lua.

    Bug #2

    • Pull Request: srapinghub/splash#487
    • Status: succeeded by another fix implementation, still open.
    • Abstract: Private methods in Lua are bounded to the last wrapped exposed object.
    • Comments: The problem was with private methods of exposed objects (objects that are exposed from Python to Lua). They were bounded to the last created object. I’ve tried to fix this bug by assigning these private methods to the exposed object (table) itself in __private method. However, in that case they would be accessed from user scripts. One of my menters – @immerrr did fantastic and mind-blown job fixing this bug in srapinghub/splash#495, you should have a look if you are familiar with Lua and its OOP implementation :smile:.

    DOM elements: take 2

    • Pull Request: srapinghub/splash#490
    • Status: wip, fixing last comments on Pull Request, open.
    • Abstract: Allows to manipulate DOM elements using Lua properties and methods.
    • Comments: after discussion with my mentors we decided to make API as simple as possible. In comparison with old API in the new one you can set node properties just by assigning to the table and calling node methods by just calling lua methods. You can even assign event handlers as pure Lua functions:

    The previous example with the new interface:

    local input = input.node:querySelector('input[name="name"]')
    input.node.value = 'mr. ' .. input.node.value
    assert(input.node.parentElement:submit())
    

    And this is how you can assign event handlers (with addEventListener):

    local button = splash:select('button')
    local tooltip = splash:select('.tooltip')
    buton.node:addEventListener('click', function(event)
        event:preventDefault()
        tooltip.node.style.left = event.clientX
        tooltip.node.style.top = event.clientY
    end) 
    

    Pretty impressive, huh? :smile:

    Overall

    I finished almost everything that I was planning to do. Some of the features that I’ve planned was implemented by other Splash contributors (event system reworking) or was already present in the project (plugin system). However, I also did more than supposed to do, like splash:with_timeout() which originally was only a flag for splash:go() command and event listeners for element.

    Final Thoughts

    This summer was the most intensive one in my entire life. I was working almost all days in a week and I’m not regretting it.

    GSOC was a unique experience for me for several reasons.

    First of all, I’ve never worked with Python and Lua in a such big project like Splash. During the coding period my Python and Lua skill level increased significantly. I want to thanks Mikhail and Dennis for helping me with every question I asked from them. You guys rock :sunglasses: :muscle:.

    The second reason is working on open source project. Particularly, writing so much documentation and tests. I’d never thought that it can take so much time. On the other hand, during test and documentation writing you better understand your implemented feature and things that will be good to add or remove.

    Finally, thank you Google for giving such opportunity to students all over the world to code and learn.

    August 20, 2016 12:00 AM

    August 19, 2016

    Abhay Raizada
    (coala)

    Final Submission

    This program was one of the best learning experiences I’ve ever had. During the entire GSoC phase i was able to contribute to mainly two repositories of the coala organisation, A sub org under the Python Software Foundation.

    These include commits on the coala repository:

    https://github.com/coala-analyzer/coala/commits?author=abhsag24

    and the coala-bears repository:

    https://github.com/coala-analyzer/coala-bears/commits?author=abhsag24

    coala-bears also contains a branch which hasn’t been merged into master yet, All my commits on this branch can be found here:

    https://github.com/coala-analyzer/coala-bears/pull/653/commits.

    I’ve been really grateful to my mentor Fabian, and the admin of my sub-org, Lasse, I’ve had the most enriching discussions with these people😀 , Cheers to coala, FOSS and the entire GSoC experience,  i’ll definitely look forward to working beside these people after the program as well.


    by abhsag at August 19, 2016 06:55 PM

    Prayash Mohapatra
    (Tryton)

    Concluding Thoughts

    Had a great journey this summer with Tryton. Really loved contributing to open source properly. Contributing these many days has made me happy. Happy for being able to give back to what I use. Thankful that I stumbled upon PyCon’s videos. Going through them is really fun. Hoping to contribute in python next time. :)

    As for Tryton, Sergi and Cédric were always there with their quick responses on the IRC. Sergi’s replies sometimes made me smile instantly :D I wish I could understand the Spanish videos at the Tryton YouTube channel. I am planning to learn Spanish basics now xD After an entire day with code, human connection felt like a bliss.

    Code Review #27421002 is where my work can be checked. This link will always have a special place in my heart.

    To infinity and beyond. adiós!

    August 19, 2016 05:28 PM

    ghoshbishakh
    (dipy)

    Google Summer of Code Progress August 19

    So we are at the end of this awesome summer and this post is about the progress in my final weeks of GSOC 2016! And the major addition in this period is the development stats visualization page.

    GitHub stats visualization

    As we had planned, the new Dipy website needed a page to highlight the growing number of developers and their contributions to the Dipy project. And finally we have achieved that with a separate django app that creates visualizations with data pulled from GitHub API, and for drawing some neat graphs I have used the Chart.js library.

    dipy github visualization page


    And hey its a separate django app!

    So it can be integrated easily into any other django project! Simply copy the github_visualization folder into your project and add github_visualization to the INSTALLED_APPS list in settings.py.

    Now you just need to add a couple of lines to the template in which you want to show the visualizations.

    <!-- load the template tags for github_visualization -->
    {% load github_stats_tags %}
    
    <!-- load css and js -->
    {% include "github_visualization/github_stats_includes.html" %}
    
    <!-- render the visualizations -->
    {% github_stats_block "username" "repository_name" %}

    Just change the ‘username’ and ‘repository_name’ to point to the GitHub repository you want to see visualizations for.

    The work was submitted throught the pull request #15.

    August 19, 2016 03:30 PM

    Ranveer Aggarwal
    (dipy)

    To New Beginnings!

    [List of Commits]   [Pull Request]   [Link to Blogs]

    Winter is coming!

    This blog post marks the end of an amazing summer, the summer of '16.
    This summer, I got an opportunity to share knowledge, views and code with some of the most brilliant people I have come across, contributing to an amazing application that has the potential to take neuroscience to the next level.

    My Google Summer of Code project with DIPY, under the umbrella of Python Software Foundation is one of the most interesting projects that I have ever worked on, which was reason enough to keep me motivated through all these months.

    The Project

    Currently, if you have an OpenGL interface, you need to use Qt/GTK or some UI library to create a window and focus out of the OpenGL window to do simple UI tasks like fill forms, click on a button, save a file, etc. Our idea is to get rid of the external interfaces and have the UI built in.
    So, all the interaction happens within the 3D world interface.

    Currently, no library in Python offers such a functionality, much needed during scientific visualisations.
    So, we built this cross platform minimal interface on top of VTK, which provides a very simple but powerful API, and can be tweaked to create futuristic interfaces.

    Pre-GSoC

    I had heard about DIPY through a friend of mine (who himself is a Google Summer of Code student for DIPY this year) and I was intrigued by what they were doing. Over my years as an undergrad, I had developed an interest in Visual Computing and this was one organisation that was doing some really cool work in the domain. So I headed over to have a look at the list of projects and sent across a mail to my (to-be) mentor, Eleftherios Garyfallidis.

    He and Marc had an idea of building a futuristic GUI for DIPY using VTK. I loved the idea and I instantly began working on making it a reality. And over the course of the next 3 months, we came really close to it.

    Getting Started

    As with any project, the first step is setting things up. This took some time given that I was on OS X and my mentors were on Linux. Setting up VTK on OS X has a really convoluted procedure, but finally, working together we got it up and running in a week or two.

    VTK is an amazing framework, but there isn’t much documentation and learning resources (for a complete newbie) on the internet. And therefore, I got off to a slow start. Over time, I realised that there is a world beyond documentation and StackOverflow and open mailing lists are one of the best things that happened to Open Source and Software Development in general.

    And then there were these situations.

    After nearly a month of struggling, we were finally able to get a click-able button working.

    The Summer

    Once we had a working button ready, the subsequent work went on smoothly. Every time we modelled a new UI element, we found out a better way to do the whole thing, and ended up rewriting older elements. There were times when we completely rewrote elements to add a single small functionality. We let go of a lot of code and came up with simpler and more efficient ways to do things.

    Since we were building a programmable UI, we tried to keep each element as generic as possible, exposing as many parameter to the user as we could. To make things simple, we also gave the best possible default values to these parameters, so that the user can simply instantiate a UI element object without compromising the amount of control he/she has on it.

    When I say smooth, I don’t mean there weren’t roadblocks. There were several times when I was stuck on a problem for days, but my mentors never shied away from discussing the problems with me, and I could ping them any time to get a solution.

    3 months hence, we have a good UI framework in place, built entirely in VTK-Python and it’s built in such a way that it can potentially be plugged into any VTK-based application.

    Tech

    The project involved knowledge about how OpenGL works, a good knowledge of Python, ability to read through documetation and mailing list archives and 3D coordinate geometry (all that effort 5 years ago finally paid off!).

    My Contributions

    A complete list of all my commits into the project are listed here.
    Here’s the PR [#1111] with all the changes.
    Here’s what we did:

    Building a Button

    Using vtkTexturedActor2D we built a button with functionalities to change icons and add callbacks. This is what we got.

    A button overlay

    A Text Input Field

    We built a textbox using vtkTextActor and added ways to edit the text in the text box. Starting from an editable actor, we ended up with a multi-line text box. We rewrote a lot of code while building this and this is where we ended up with the idea of having a generic UI super class for all the UI elements. This element also introduced ui parameters (to pass between the element and the interactor) which were later deprecated.

    A text box

    Line Sliding

    While building the line slider, we realised the need for multiple elements within one element. This is where the idea of a common set_center method came up. This element also introduced changes in the way we added elements to the renderer. We also introduced a ui_list for each element that carries all the sub-elements in that element. We ended up with this.

    The Line Slider

    Circular Slider

    Using techniques similar to above, we built a circular slider, using a lot of math. The circular slider underwent a lot of modifications while adding it to the panel because we wanted to maintain a constant value while moving it around.

    The Circular Slider

    3D Menus

    Moving on, the idea was to use the existing elements in 3D. Using vtkAssembly and vtkFollower, the former taking a lot of time to understand (but thanks to efforts by my mentors, it turned out to be not so convoluted), we successfully ported several 2D elements to 3D. We couldn’t do 3D sliding, so that is something we will be appending to future work.

    The Orbital Menu

    A More Complex Orbital Menu

    A 2D Panel

    A 2D Panel is basically a collection of 2D elements. Built in such a way that they relatively stay the same, not depending on the size of the panel, the panel turned out to be more useful than we thought after we managed to set up a panel of panels.
    We also used the set_centers recursively to move the panel with all its elements around. We also used it to align panels to the left or right of the screen.

    A Panel

    A Right-Aligned Panel

    A File Menu

    The time had come to build a file dialog. Using os and glob.glob we built a file menu for displaying files in the current folder and changing directories when clicked.

    The File Menu

    A File Dialog

    To put all that we had done to test, we built a file save dialog. This used almost everything we had built till now - panels, buttons, text box, etc. Here, for inter-object communication, we introduced optional parent references for each element. In the end, it all worked out well :)

    The File Menu

    Future Work

    Here’s what we want to do in the future:

    • Build robust 3D elements
    • Convert the prototype elements to futuristic looking elements
    • Add unit tests - we don’t have a unit testing framework right now, we are looking for one

    Acknowledgements

    While the final certificate would bear my name, there are countless brilliant brains that went behind this project.

    Both my mentors, Eleftherios Garyfallidis and Marc-Alexandre Côté have stood by my side, all through the summer, with regular (and at times more than that) meetings, reviews and suggestions. I could ping them at any odd time of the day and they would promptly reply to all my doubts. The Story of the Rabbit’s PhD Thesis holds true :D

    The people who built VTK and those who built the Python wrapper for it have done an amazing job. It’s an amazing framework, still in development. I am also indebted to the people who discuss all their doubts online and leave breadcrumbs leaving to the right resource.

    And lastly, my colleagues who gave me valuable feedback like, “Try red!”.

    Thank you all!

    August 19, 2016 02:50 PM

    udiboy1209
    (kivy)

    How To Use Tiled Maps In KivEnt

    This is a tutorial for using the maps module in kivent to load Tiled maps. It covers the steps required to load and display a map on the kivent canvas, but it doesn’t cover how to make a map in Tiled.

    Building and installing the module

    Make sure you have kivy and its dependencies properly set up and working using these installation instructions.

    Clone the kivent repository to obtain the source. The module is currently in a separate branch ‘tiled_gsoc_2016’ so you can clone that branch only.

    git clone -b tiled_gsoc_2016 https://github.com/kivy/kivent.git
    

    You can skip the -b tiled_gsoc_2016 if you want the whole repository.

    Install kivent_core first. Assuming you cloned it in a dir named ‘kivent’

    $ cd kivent/modules/core
    $ python setup.py build_ext install
    

    Then install the maps module similarly

    $ cd kivent/modules/maps
    $ python setup.py build_ext install
    

    It is best to set up kivy and kivent in a virtual environment. Just make sure you use the correct python for the above commands. The module works best with python3, but it works with python2 too.

    Setting up the KV file

    We need a basic setup of the gameworld and a gameview where we will add the renderers to be displayed. We also need to add systems which the tiles depend on like PositionSystem2D, ColorSystem and MapSystem.

    TestGame:
    
    <TestGame>:
        gameworld: gameworld
        camera1: camera1
        GameWorld:
            id: gameworld
            gamescreenmanager: gamescreenmanager
            size_of_gameworld: 250*1024
            system_count: 4
            zones: {'general': 100000}
            PositionSystem2D:
                system_id: 'position'
                gameworld: gameworld
                zones: ['general']
            ColorSystem:
                system_id: 'color'
                gameworld: gameworld
                zones: ['general']
            MapSystem:
                system_id: 'tile_map'
                id: tile_map
                gameworld: gameworld
                zones: ['general']
            GameView:
                system_id: 'camera1'
                gameworld: gameworld
                size: root.size
                window_size: root.size
                pos: root.pos
                do_scroll_lock: False
                id: camera1
                updateable: True

    PositionSystem2D is necessary for any map because it it responsible for the tile positions. And MapSystem holds the relevant data for the map hence that is necessary too, obviously. ColorSystem is required if there are shapes in your map which require coloring. And GameView is the canvas where we will render the map’s layers.

    This is the basic boilerplate KV necessary for rendering the map.

    Setting up the Systems

    I will start with the basic game app structure of main.py.

    def get_asset_path(asset, asset_loc):
        return join(dirname(dirname(abspath(__file__))), asset_loc, asset)
    
    class TestGame(Widget):
        def __init__(self, **kwargs):
            super(TestGame, self).__init__(**kwargs)
    
            # Init gameworld with all the systems
            self.gameworld.init_gameworld(
                ['position', 'color', 'camera1', 'tile_map'],
                callback=self.init_game)
    
        def init_game(self):
            self.setup_states()
            self.set_state()
    
        def setup_states(self):
            # TODO we need to add the state of the systems to this gameworld state
            self.gameworld.add_state(state_name='main')
    
        def set_state(self):
            self.gameworld.state = 'main'
    
    class YourAppNameApp(App):
        pass
    
    if __name__ == '__main__':
        YourAppNameApp().run()

    We now need to load the systems required for each layer. We will have to specify parameters for them the same way we fo it in KV files. We will make 3 dicts, one each for Renderer, PolyRenderer and AnimationSystem and pass them to load_map_systems util function to create 4 layers.

        def __init__(self, **kwargs):
            super(TestGame, self).__init__(**kwargs)
    
            # Args required for Renderer init
            map_render_args = {
                'zones': ['general'],
                'frame_count': 2,
                'gameview': 'camera1',
                'shader_source': get_asset_path('positionshader.glsl', 'assets/glsl')
            }
            # Args for AnimationSystem init
            map_anim_args = {
                'zones': ['general'],
            }
            # Args for PolyRenderer init
            map_poly_args = {
                'zones': ['general'],
                'frame_count': 2,
                'gameview': 'camera1',
                'shader_source': 'poscolorshader.glsl'
            }
    
            # Initialise systems for 4 map layers and get the renderer and
            # animator names
            self.map_layers, self.map_layer_animators = \
                    map_utils.load_map_systems(4, self.gameworld,
                            map_render_args, map_anim_args, map_poly_args)

    We will be returned a list of renderers and animators. This list can be added to the gameworld init sequence like so. Renderers and animators require specific states to be set so we have to add these lists while setting states. Modify the corresponding lines with these.

            # Init gameworld with all the systems
            self.gameworld.init_gameworld(
                ['position', 'color', 'camera1', 'tile_map']
                + self.map_layers
                + self.map_layer_animators,
                callback=self.init_game)

    and

        def setup_states(self):
            # We want renderers to be added and unpaused
            # and animators to be unpaused
            self.gameworld.add_state(state_name='main',
                    systems_added=self.map_layers,
                    systems_unpaused=self.map_layer_animators + self.map_layers)

    These systems need to be rendered from bottom to top to preserve the layer order. And the gameview camera handles rendering of these systems. So we will set the render order for that camera to match layer index. Add this line to __init__.

            # Set the camera1 render order to render lower layers first
            self.camera1.render_system_order = reversed(self.map_layers)

    Loading the TMX file

    Next up, we need to populate our systems with entities and for that we need a TileMap loaded with tile data. This data will be obtained from the TMX file. The util module has a function for loading TMX files and registering them with the map manager

        def setup_tile_map(self):
            # The map file to load
            # Change to hexagonal/isometric/isometric_staggered.tmx for other maps
            filename = get_asset_path('orthogonal.tmx','assets/maps')
            map_manager = self.gameworld.managers['map_manager']
    
            # Load TMX data and create a TileMap from it
            map_name = map_utils.parse_tmx(filename, self.gameworld)

    setup_tile_map() should be added to init_gameworld() so that it is called after gameworld init.

    parse_tmx takes the filename of the TMX, loads it to a TileMap, registers it in the map_manager with name as the filename without the extension, and returns that same name.

    Creating Entities in the GameWorld

    All that is left to do is to create entities from this TileMap. The function for that is init_entities_from_map. It requires the TileMap instance and an instance of the gameworld’s init_entity function. It is used like this:

            # Initialise each tile as an entity in the gameworld
            map_utils.init_entities_from_map(map_manager.maps[map_name],
                                           self.gameworld.init_entity)

    You can add this to setup_tile_map after parse_tmx is called and we have the TileMap.

    This is all we require to load a Tiled map in KivEnt.

    Download the source files from here!

    Thank you and happy tiling!


    EDIT (2016-08-24): Added installation instructions.

    August 19, 2016 12:00 AM

    jbm950
    (PyDy)

    GSoC Week 14

    This is the final week of GSoC! I have the write up of the work done on a different page.

    This project has been a wonderful learning experience and I have met many wonderful people over the course of this summer. Before the summer began I had never written any test code. Now at the conclusion of the summer I have written benchmark tests and I try to make sure the code I write has complete unit coverage. I feel that this is one of my biggest areas of improvement over the past 3 months. I have also learned a considerable amount about the combination of programming and dynamics. Prior to GSoC my experience with dynamics consisted solely of finding the equations of motion by hand. With my time in GSoC I have been exposed to finding the equations of motion programatically and using the results with an ode integrator. I have also obtained a more in depth knowledge of different algorithms for creating the equations of motion.

    Another thing that I have enjoyed in this experience is seeing how a large programming project works. I feel that this will make me more employable in the future as well as allow me to help any other new open source community get up and running.

    Lastly, one of the big highlights of my summer was my trip to Austin, TX for Scipy 2016. Being a GSoC student I not only got to go to my first professional conference but I was able to help give one of the tutorials. Also I was able to meet many of the other people who work on sympy. This went a long way in making my contributions to sympy from being just contributions to some organization to working on a project with other people. It was also neat to meet the developers of other open source projects. Like my comment about sympy, I now see those projects less as being “project name” and more as the people actually working on them.

    I have enjoyed my GSoC experience and if I were eligible I would not hesitate to apply again. I would like to thank those who chose me and made this summer of learning and experiences possible.

    August 19, 2016 12:00 AM

    fiona
    (MDAnalysis)

    Coding and Cats - the end of GSoC

    Hello again! It’s been a while since my last post and there’s a lot I’d liked to have talked about more if I’d found the time, but with the end of the coding period only a handful of days away, let’s take a look at what I’ve managed to achieve throughout GsoC.

    My GSoC project has been divided into three main parts; you can find all the code I’ve been writing (including documentation, examples, and discussion!) by following the links below to the pull request I made for each, and I’ll also give a little summary below that about what’s possbile with each part and the possible future changes and improvements (in addition to general things like more testing and fixing any bugs that pop up). Only one part has reached the stage of merging with MDAnalysis, but the current versions of the other two are at least working and, with a bit more work post-GSoC will hopefully follow soon!

    Links to code (pull requests)


    Add Auxiliary

    This part involved adding an ‘auxiliary’ module to MDAnalysis to allow additional timeseries data from a simulation to be read alongside the trajectory. You can read more about it in this post, or to see more about how it’d work in practice, here’s the documentation.


    Make bundle

    This part (originally planned as a less-general ‘umbrella class’) involved adding a function to let us group together any related simulations and their various ‘metadata’ and ‘auxiliary data’, to make it easier to keep track of and perform analysis across these simulations. You can read more about it in this post.


    Run WHAM

    The last part involved writing a function to allow WHAM analysis to be performed in MDAnalysis. I didn’t get around to making a post looking at this part in particular, but you can check out the original discussion on github. I mentioned WHAM back in this post; it’s an algorithm we can use to take data from a set of umbrella sampling simulations and calculate the ‘Potential of Mean Force (PMF)’ profile showing the (one-dimensional) energy landscape of the system we’re simulating.

    Rather than write a new implementation, I’ve been writing a wrapper for Alan Grossfield’s implementation. Grossfield’s WHAM is run from the command line and uses input files following a particular format. The idea of my wrapper is to allow WHAM to be run in a python script alongside other analysis, remove the need for specific file-formatting, and offer some additional options such as start and end times or changing energy units.


    Umbrella Sampling: bringing it all together

    The main personal driving force behind my project was to be able to simplify analysis of Umbrella Sampling simulations in MDAnalysis. So how does the work I’ve done pull together to achieve that? Let’s let our final kitties of GSoC demonstrate:

    (Can’t remember why we want Umbrella Sampling simulations and PMF profiles? Go back to this post).

    Again, there’s still a little work left before this is fully realised within MDAnalysis – but when I find time post-GSoC I’ll definitely be working to get this done!


    And finally…

    Despite many moments of frustration I’ve enjoyed GsoC: I’ve definitely improved my coding skills and I’ve (mostly) built something I’m likely to actually use for my own research! A huge thanks to my mentors and everyone else at MDAnalysis for help, ideas and discussion along the way.

    I had a lot of fun drawing my little cat friends help make this blog interesting, hopefully it’s been fun and informative for you as well! I’d like to keep up posting here (though it’s likely to be even less frequently), so keep an eye out if you’re interested - at the very least, I’ll make a post when ‘make bundle’ and ‘run wham’ are finished.

    Thanks for following along GSoC with me and for putting up with all my cats, and (hopefully) see you again sometime!

    August 19, 2016 12:00 AM

    August 18, 2016

    Ramana.S
    (Theano)

    chrisittner
    (pgmpy)

    GSoC Project Status Quo

    GSoC 2016 is coming to an end I’ve just sent the last PR necessary to complete the scope of my proposal. It has been an exciting project, and I do feel that I learned a lot. I was able to implement a number of basic BN structure estimation algorithms, that I wanted to study for a long time. Once all the reviewing is done, pgmpy will support for basic score-based structure learning, with the usual structure scores (BIC, BDeu, K2) and exhaustive search and local heuristic search (hill climbing) with tabu search, edge blacklists and whitelists and indegree restriction.

    It will also support basic constraint-based structure learning with conditional chi2 independence tests. I implemented the PC algorithm and a PDAG completion procedure (under review). Finally, MMHC, a hybrid learning algorithm is also implemented (under review). In the beginning of the project I also worked on Bayesian parameter estimation for BNs.

    While the estimators/-folder is already less empty than before, a lot remains to be done. pgmpy learning does not yet have:

    • learning for MarkovModels
    • learning for continuous networks
    • better support for learning from incomplete data
    • Chow-Liu tree-structure BN learning <- an efficient algorithm that can find the optimal tree network given data. Sounds great, then we get tree-bayes classification as well.
    • Strong documentation & show cases to get more people interested in pgmpy

    Tomorrow, I’ll share another post with my key learnings about BNs!

    by Chris Ittner at August 18, 2016 10:00 PM

    Raffael_T
    (PyPy)

    It's alive! - The final days of GSOC 2016

    I promised asyncio with async-await syntax, so here you have it :)
    I fixed all bugs I could find (quite many to be more exact, which is normally a good sign), and as a result I could run some programs using asyncio with async-await syntax without any error, with the same results cpython 3.5 would give.
    I implemented all tests of cpython for the async-await syntax and added some tests to check if everything works together well if combined in a more complex program. It's all working now.
    The GIL problem I described earlier was just an incompatibility of asyncio with the PyPy interpreter (pyinteractive). The error does not occur otherwise.
    I have been working on the py3.5-async branch in the PyPy repository lately and that is also where I did all my checks if everything is working. I also merged all my work into the main py3.5 branch. This branch was really broken though, because there are some major changes from py3k to py3.5. I fixed some things in there, and the PyPy team managed to fix a lot of it as well. My mentor sometimes took some time to get important things to work, so my workflow wouldn't get interrupted. While I was working on py3.5-async, a lot of fixes have been done on py3.5, changing the behaviour a bit and (possibly) breaking some other things. I have not yet checked everything of my proposal on this branch yet, but with some help I think I managed to get everything working there as well. At least it all looks very promising for now.
    Next to that I also managed to do some fixes I could find in the unpack feature of my proposal. There have been some special cases which lead to errors, for example if a function looks like this: def f(*args, **kwds), and a call like that: f(5,b=2,*[6,7]), I got a TypeError saying “expected string, got list object”. The problem here was that certain elements had a wrong order on the stack, so pulling them normally would not work. I made a fix checking for that case, there are probably better solutions but it seems to work for now.


    I will probably keep working on py3.5 for a little bit longer, because I would like to do some more things with the work of my proposal. It would be interesting to see a benchmark to compare the speed of my work with cpython. Also there is a lot to be done on py3.5 to work correctly, so I might want to check that as well.



    Here is a link to all of my work:
    https://bitbucket.org/pypy/pypy/commits/all?search=Raffael+Tfirst



    My experience with GSOC 2016
    It's really been great to work with professional people on such a huge project. My mind was blown as I could witness how much I was able to learn in this short time. The PyPy team was able to motivate me to achieve the goal of my proposal and to get a lot more interested in compiler construction than I have already been before.
    I am glad I took the chance to participate in this years GSOC. Of course my luck with having such a helpful and productive team on my side is one of the main reasons why I enjoyed it so much.

    by Raffael Tfirst (noreply@blogger.com) at August 18, 2016 09:33 PM

    Aakash Rajpal
    (italian mars society)

    Final Week Final Blog!!

    Hey Everyone, as the project submission deadline nears I have been working to finish my project. The  Heads Up Display(HUD)  is rendering perfectly fine on the Oculus  interacting with the leap motion and receiving data from the server. Till now, I was able to integrate the HUD with my own demo model in blender and simulate it on to the Oculus. The Final Step involved integrating the HUD with one of the model scenes available on the IMS V-ERAS repository.

    This was a difficult task as the Italian Mars Society initially started to simulate models through the Blender Game Engine for the DK1 but as the support for Blender by Oculus was very limited they decided to shift to Unity Instead. Currently, they are working only on Unity and as I am working under PSF organization using unity was not an option. Also since there has been a lot of changed from Rift DK1 to Dk2, most of the  models were unable to render successfully on the Oculus.

    I had a little chat with my mentor about this issue, and he asked me to integrate the HUD with the models that were fully rendered on DK2 and forget about the other ones. After that, I found an avatar model and began work to integrate the HUD on to it.ouple of days later, I was able to render the HUD on to the Oculus with one of V-ERAS models through Blender Game Engine and the result seems good.

    Couple of days later, I was able to render the HUD on to the Oculus with one of V-ERAS models through Blender Game Engine and the result seems good.

    hud

     

     

    Currently, I am trying to make the HUD an addon in Blender Game Engine so that it can be imported into any model/scene and render successfully on the Oculus.

    As for the final submission , I have started with the documentation and hopefully ,I will submit by the end of this week.

    Final Post😦

     

     


    by aakashgsoc at August 18, 2016 07:12 PM

    Preetwinder
    (ScrapingHub)

    GSoC-6

    Hello,
    This post continues my updates for my work on porting frontera to python2/3 dual support.
    This will be my final post. I have achieved my goal satisfactorily, frontera now works under both Python 2 and Python 3 for all the different scenarios such as with different DB’s(Postgres, Mysql, Hbase) and with different Messagebus implementations(Kafka, ZMQ). The latest release of Frontera - 0.6.0 now available on PyPI contains these changes.

    I have also substantially increased the test coverage, now almost all the major components(Workers, Backends, Manager etc) are tested.
    Here is a link to all of my commits

    This period of 3 months has been invaluable to me, and I have learned a lot during it. I am greatly thanful to my mentors Alexander Sibiryakov and Paul Tremberth for working with me and helping me through this period. I would also like to thank my sub-organization Scrapinghub and my organization Python Software Foundation for selecting me. Finally I would like to thank Google for providing this opportunity to me and others.

    GSoC-6 was originally published by preetwinder at preetwinder on August 18, 2016.

    by preetwinder (you@email.com) at August 18, 2016 04:45 PM

    Vikram Raigur
    (MyHDL)

    Backend Top Module

    The Back-end Top module connects quantizer, RLE, Huffman and Byte-Stuffer Modules.

    The back-end have a small FSM sitting inside which makes all the modules run parallely.
    i.e when Byte-Stuffer sends ready signal to Huffman, Huffman sends ready to RLE, RLE send ready signal to Quantizer.

    The Back-end have a input buffer attached which will take data from the front-end, The Buffer have size of 3*64, so that front-end do not have to wait for the back-end to be ready.

    Among the backend modules huffman module takes around 80 clock cycles. So, each block have to wait till those cycles finish.


    by vikram9866 at August 18, 2016 02:35 PM

    ByteStuffer

    The Byte Stuffer module checks for 0xFF bytes and appends 0x00 after it. The Byte Stuffer module is implemented and already merged in the main repo.


    by vikram9866 at August 18, 2016 02:31 PM

    Valera Likhosherstov
    (Statsmodels)

    GSoC 2016 #Final

    Prediction and forecasting

    The last challenge I faced during GSoC was to implement Kim prediction and forecasting. At first it appeared to be quite difficult, because I dealt with both mathematical and architectural issues. Here's a short overview of the subject.

    Prediction maths

    Kalman filter in Statsmodels has three modes of prediction:
    • Static prediction. This is prediction of the next observation based on current. This type of prediction is equivalent to usual Kalman filtering routine.
    • Dynamic prediction. Still don't get its purpose.
    • Forecasting. Mathematically speaking, forecast of the out-of-sample observation is its expectation conditional on known observations.
    My goal was to implement two of these types - static prediction and forecasting in case of switching regimes, i.e. construct Kim prediction and forecasting device.
    I haven't got any problems with static prediction, because it's also equivalent to Kim filtering. But forecasting issue is not covered in [1], so I had to use my own intelligence to come up with mathematical routine, calculating future data expectation and covariance, conditional on known observations. Basically it's just a Kim filter, but without Hamilton filtering step and with underlying Kalman filters working in prediction mode.

    Prediction code

    My laziness forced intention to write less code and reuse the existing. The idea was to somehow reuse Kalman filter prediction, but when I came up with the correct forecast routine I understood that it doesn't appear to be possible. So I had to implement all the routine myself, which is located at kim_filter.KimPredictionResults._predict method. Luckily the prediction code is much more compact then the filter's one. Also I didn't have to care so much about optimality, since prediction doesn't take part in likelihood optimization.

    Some nice pics

    Since no test data is available for forecasting, I used iPython notebooks as a sort of visual testing.
    I added pictures of static (one-step-ahead) prediction and forecasting to Lam's model and MS-AR model notebooks, they look sensible and my mentor liked them:
    (this is for Lam's model)
    (and that's for MS-AR)
    Forecast's variance is constant, because it's fast to find a stationary value.

    GSoC: Summary

    I'm proud to say that I've completed almost all items of my proposal, except of constructing a generic Kim filter with an arbitrary r number of previous states to look up. But this is due to performance problems, which to be solved require more time then GSoC permits. In detail, implemented pure-Python r=2 case works slowly and is to be rewritten in Cython.
    Anyway, a good advantage of my Kim filter implementation, as mentioned by my mentor, is using logarithms to work with probabilities. It gives a high improvement in precision, as I conclude from testing.
    A broad report on what's completed and what's not can be found here in github comments.

    GSoC: My impressions

    GSoC has surely increased my level of self-confidence. I've made a lot of nice work, written 10k lines of code (I was expecting much less, to be honest), met many nice people and students.
    I have to admit, that GSoC appeared to be easier than I thought. The most difficult and nervous part of GSoC was building a good proposal. I remember, that I learned a lot of new material in very short terms  - I even had to read books about state space models during vacation, from my smartphone, sitting in a plane or a subway train.
    I also started working on my project quite early - in the beginning of May. So, I had like 60% of my project completed by the midterm, and I didn't start working full time yet, because my school and exams finished only by the end of June.
    So I worked hard during July, spending days in the local library, but still, I think I never worked like 8 hours a day. Eventually, to the beginning of the August I completed almost everything, the only left thing was prediction and forecasting, discussed previously in this post.
    I dreamed about taking part in GSoC since I was sophomore, and I'm glad I finally did it. The GSoC code I produced is definitely the best work I've ever done, but I hope to do more in future.
    Thanks for reading my blog! It was created for GSoC needs only, but I think I will continue writing, as I get something interesting to tell.

    Literature

    [1] "State-space Models With Regime Switching" by Chang-Jin Kim and Charles R. Nelson.

    by Valera Likhosherstov (noreply@blogger.com) at August 18, 2016 08:30 AM

    Shridhar Mishra
    (italian mars society)

    Update! on 10 July-Tango installation procedure for windows 10.

    Tango installation on windows can be a bit confusing since the documentation on the website is old and there are a few changes with the new versions of MYSQL.

    Here's the installation procedure that can be helpful.


    1 : Installation of MySQL
    In this installation process mysql 5.7.11 is installed. Newer versions can also be installed.
    while installation we cannot select a manual destination folder and Mysql is installed in C:\Program Files (x86)\MySQL by default.
    During installation it is mandatory to set at least a 4 character password for root which wasn't the case for previous versions.
    This is against the recommendation from tango-controls which was specific for older version of SQL.

    2 : Installation of TANGO
    You can download a ready to use binary distribution on this site.

    http://www.tango-controls.org/downloads/binary/

    Execute the installer. You should specify the destination folder created before :'c:\tango'
    After installation you can edit the MySQL password.

    3 : Configuration

    • 3-1 TANGO-HOST

        Define a TANGO_HOST environment variable containing both the name of the host on which you plan to run the TANGO database device server (e.g. myhost) and its associated port (e.g. 20000). With the given example, the TANGO_HOST should contain myhost:20000.

        On Windows you can do it simply by editing the properties of your system. Select 'Advanced', 'Environment variables'.

    • 3-2 Create environment variables.

    2 new environment variables has to be created to run create-db.bat
      • 3-2-1 MYSQL_USER

    this should be root

      • 3-2-2 MYSQL_PASSWORD

    fill in the password which was used during mysql installation.


    • 3-3 MODIFY PATH

    Add this to windows path for running sql queries.
    C:\Program Files (x86)\MySQL\MySQL Server 5.7\bin

    • 3-4 Create the TANGO database tables

    Be sure the mysql server is running, normally it should.
    Execute %TANGO_ROOT%\share\tango\db\create_db.bat.

    • 3-5 Start the TANGO database:

    execute %TANGO_ROOT%\bin\start-db.bat -v4
    the console should show
    "Ready to accept request" on successful installation.

    • 3.6 Start JIVE

    Now you can test TANGO with the JIVE tool, from the Tango menu, or by typing the following command on a DOS windows :
        %TANGO_ROOT%\bin\start-jive.bat


    Ref: http: www.tango-controls.org/resources/howto/how-install-tango-windows/

    by Shridhar Mishra (noreply@blogger.com) at August 18, 2016 08:25 AM

    Aron Barreira Bordin
    (ScrapingHub)

    Scrapy-Streaming [8/9] - Scrapy With Node.js

    Hi everyone,

    In these weeks, I’ve added support for Node.js Language on Scrapy Streaming. Now, you can develop scrapy spiders easily using the scrapystreaming npm package.

    About

    It’s a helper library to help the development process of external spiders in Node.js It allows you to create the scrapy streaming json messages.

    Docs

    You can read the official docs here: http://gsoc2016.readthedocs.io/en/latest/js_node.html

    Examples

    I’ve added a few examples about it, and a quickstart section in the documentation.

    PRs

    Node.js package PR: https://github.com/scrapy-plugins/scrapy-streaming/pull/10 Examples PR: https://github.com/scrapy-plugins/scrapy-streaming/pull/4 Docs PR: https://github.com/scrapy-plugins/scrapy-streaming/pull/7

    Thanks for reading,

    Aron.

    Scrapy-Streaming [8/9] - Scrapy With Node.js was originally published by Aron Bordin at GSoC 2016 on August 18, 2016.

    by Aron Bordin (aron.bordin@gmail.com) at August 18, 2016 04:24 AM

    August 17, 2016

    liscju
    (Mercurial)

    Work Product submission

    So the GSOC is nearly over, its time to make a summary what was done:

    • all things mentioned in the proposal were done
      • there is a support for having large file stored outside of repository, in remote location
      • remote location destination can be statically stored in configuration file
      • remote location destination can be generated dynamically with hook
      • for now there is support for storing large files in remote location in
        • local file system
        • http
        • https
      • it's easy to write support for new protocol for communicating with remote destination server
      • solution works for clients with old versions of mercurial
      • repository with redirection feature turned on doesnt need to store any large files locally, the only occasion in which file would be downloaded is when user does things like update/commit
    • nearly all the code is located in a single python file, which makes functionality of feature not mangling with standard largefiles code
    • i was blogging about my experience during gsoc in http://liscjugsoc.blogspot.com/
    • i created a user manual for the feature: 
      • https://docs.google.com/document/d/16L6fpWb9wMddlzi_UyjjyK1xRlklpJYFwrgzvjVkbM8/edit#heading=h.mtvtgs3ic99g
    • i created a technical documentation for the project
      • https://docs.google.com/document/d/1aPirM5JwqIuawTo_BmIPYk29WhMHIvvhM_xoLPqCipc/edit#heading=h.w7ygvk976ajk

    All commits can be seen here:
    https://bitbucket.org/liscju/hg-largefiles-gsoc/commits/all?search=%40%3A%3Adev+and+not+%40

    Or all commits in single patch file:
    https://drive.google.com/file/d/0B7faUZAQf-I1N2gtQ3k1ZVBLVkk/view?usp=sharing

    What needs to be done:
    • Merge to the Mercurial Source Code:P








    by Piotr Listkiewicz (noreply@blogger.com) at August 17, 2016 08:18 PM

    Levi John Wolf
    (PySAL)

    Formally Winding down GSOC

    This week, I’ve been really bringing the work on GSOC to a close. Thus, I’ve linked to a notebook where I walk through the various work I’ve done in the project. 

    What a ride. 

    August 17, 2016 06:15 PM

    mkatsimpris
    (MyHDL)

    Work Product

    In the following post is summarized the overall work that has been done during the GSoC. Completed Work  Color Space Conversion Module with parallel inputs and parallel outputs interface Color Space Conversion Module with serial inputs and serial outputs interface 1D-DCT Module 2D-DCT Module Zig-Zag Scan Module Complete Frontend Part of the Encoder Block Buffer Input Block Buffer All the

    by Merkourios Katsimpris (noreply@blogger.com) at August 17, 2016 09:46 AM

    Work Product

    In the following post is summarized the overall work that has been done during the GSoC. Completed Work  Color Space Conversion Module with parallel inputs and parallel outputs interface Color Space Conversion Module with serial inputs and serial outputs interface 1D-DCT Module 2D-DCT Module Zig-Zag Scan Module Complete Frontend Part of the Encoder Block Buffer Input Block Buffer All the

    by Merkourios Katsimpris (noreply@blogger.com) at August 17, 2016 09:46 AM

    Riddhish Bhalodia
    (dipy)

    Last Blog!

    So we are near the end of Google Summer of Code (GSOC) 2016, and I am very excited that my work will help researchers and students using DIPY. I had a really wonderful experience working with my mentors and DIPY. GSOC has definitely built my concepts stronger, made me a better programmer and a better researcher than before. I would like to have this blog summarize my GSOC work, where I will try to answer the following questions:

    1. What are the algorithms/programs that I have implemented in this GSOC?
    2. What are the aims that were met?
    3. How different are the initial proposal and the final outcome? and so on.

    The links to all my code

    This is a list of comprehensive links to all the merged as well as ongoing pull requests (PRs) and the commits to those PRs:

    PR 1: Local PCA Slow (just the python implementation of [1]) [Soon to be MERGED].
    All commits for PR1

    PR2: Local PCA fast (with the cython implementation) [Resolving a bug]
    All commits for PR2

    PR3: Adaptive Denoising (implemented and optimized [2]) [MERGED]
    All commits for PR3

    PR4: Brain Extraction [Soon to be MERGED]
    All commits for PR4

    PR5: MNI Template Fetcher [MERGED]
    All commits for PR5

    Throughout the blog the above mentioned PRs will be referred by their PR numbers.

    Let me start by describing my initial proposal

    The initial proposal

    There were two primary aims of my GSOC proposal:

    1. Local PCA and it’s Optimization

      I had to implement the method described in the paper [1] in python for DIPY. I had to optimize the algorithm in cython to achieve higher speedup for the same. Along with this I had to test for several different data and MRI scans.

    2. Brain Extraction Method and it’s Implementation

      We had to come up with a plausible idea of having a brain extraction method suitable to DIPY and implement it.

    The Final Output

    I am pleased to say that we have managed to meet most of the goals set by the GSOC proposal and more. Breaking it into pieces:

    1. Python implementation of Local PCA [1] (DONE) [PR1]
    2. Cython implementation of Local PCA [1] (DONE except for the bug described in next section) [PR2]
    3. Tests and Documentation for the Local PCA [1] (DONE) [PR1, PR2]
    4. The local PCA algorithm was tested for several different datasets like:
      • Single shell Stanford HARDI Data
      • Multi shell Sherbrooke Data
      • Multi-shell Human Connectome data
      • General Electric (GE) diffusion MRI data

        denoised_localpca.pngLocal PCA output
    5. Adaptive Soft Coefficient Matching [2] (DONE) [PR3] (Adaptive denoising PR)
    6. Nlmeans (non-local means) block wise averaging and it’s optimization [2,3] (DONE) [PR3] (Adaptive denoising PR)
    7. Tests and Documentation of nlmeans block and adaptive soft coefficient matching [1] (DONE) [PR3] (Adaptive denoising PR). Tests done for T1 as well as diffusion data and it showed significant improvement in both cases.

      figure_1-7Adaptive soft coefficient matching output
    8. Brain extraction Method designed (DONE) [PR4]
    9. Brain extraction Implemented in python (DONE) [PR4]
    10. Brain extraction experiments for different datasets (Some remaining):
      • The experiments are done using the IBSR dataset of MRI which has manual segmentation as well. This was chosen as it is easily accessible as well as the extraction result can be compared with the manually extracted one which can give us a comparative metric to gauge the efficiency of the algorithm.
      • The metric we used to compare manual and automatic extraction  is the Jaccard’s measure and we observed mean jaccard’s index of 0.84 for three subjects from the ISBR data.
      • The experiment was done using T1 template (MNI template), with T1 input data. To test how does the change in modality affects the extraction we also tried it with T1 template and input data of different modality (like B0), and it worked out really well.
    11. Brain extraction speedup (DONE) [PR4]
    12. Tests and Documentation of brain extraction (Almost DONE) [PR4]

      brain_extraction_sameBrain extraction output

    Note : As there are lot of results you can look at each one of then in the entire blog whose titles are easy to follow. ( riddhishbgsoc2016.wordpress.com )

    Blog post links

    Here is the list of all my blogposts for GSOC 2016, it will give a more comprehensive explanation for each milestone:

    1. Code Mode On! GSOC 2016 with DIPY
    2. Adaptive Denoising for Diffusion Images
    3. Noise Estimation for LocalPCA
    4. What am I doing wrong?
    5. LocalPCA && Adaptive Denoising
    6. Validation of Adaptive Denoising
    7. Summary Blog
    8. Speeding Up!
    9. Tying up loose ends!
    10. Registration experiments
    11. Brain Extraction Explained!
    12. Brain Extraction Walkthrough!
    13. Last Blog!

    Immediate Next Steps

    A] Fix the fast bug

    I have successfully implemented the fast localPCA in cython [PR2], but upon extensive testing we have found a strange bug. Fast cython implementation takes less time than the python only version in most cases, but for certain systems (some of the laptops) it does not do so! This system specific inconsistency is something we have to resolve fast and possibly in couple of weeks.

    B] More experimentation and testing of brain extraction

    We need to experiment with the adaptive denoising algorithm when used in combination with brain extractions and how does that affect the performance. We also want to test the brain extraction under several different combinations of the modalities of the input and the template data, to see how robust is our implementation.

    C] Improving the brain extraction tutorial, and implement one more data fetcher for the smooth understanding of the tutorial.

    C] Make the cython implementation of LocalPCA even faster (which may come under the umbrella of fixing the above described bug).

    D] Merge all the PRs.

    Future Directions to Follow

    A] Open multiple-process (Open MP) based multithreading for both adaptive denoising and localPCA, this can result in a more faster implementation for the local PCA.

    B] Short publications for local PCA and brain extraction. The projects led to some exciting directions and we believe that we can have some kind of research output from them. I will keep working with the DIPY team on these projects and will keep this blog updated on whatever new updates we have in this regard.

    ~Adios

    References

    [1] Diffusion Weighted Image Denoising Using Overcomplete Local PCA
    Manjón JV, Coupé P, Concha L, Buades A, Collins DL, et al. (2013) PLoS (Pub Library of Science) ONE 8(9): e73021

    [2] Multiresolution Non-Local Means Filter for 3D MR Image Denoising
    Pierrick Coupe`, Jose Manjon, Montserrat Robles, Louis Collins. Adaptive .
    IET Image Processing, Institution of Engineering and Technology, 2011

    1432675722.png


    by riddhishbgsoc2016 at August 17, 2016 06:09 AM

    meetshah1995
    (MyHDL)

    That's all folks

    If you recall Bugs Bunny from the title, you are awesome :).

    On a more serious note, This is officially the last post of this myhdl-riscv gsoc series :(.

    I implemented a RISC-V based processor Zscale in myHDL and validated-verified it with unit and assembly tests.

    As of today, the entire core (Zscale processor) is functional except for the assembly test of the core. My co-gsocer, +Srivatsan Ramesh and I are working on it and hopefully should get it done soon. My college started on July 19 which slowed down the last few weeks which majorly involved testing the core.

    Having said that, it gives me immense pleasure to have helped the myHDL & PSF community and would be more than glad if myHDL users used the riscv repository for their research and development.

    It has been a great 3 months with a lot of learning and interaction experiences I gained on the way. Shout out to my mentors +Keerthan JC  and +Christopher Felton   for guiding my way through this seemingly difficult task.

    My consolidated contributions can be found here.

    I will promote myHDL in my university and contribute to the main myhdl repository in the coming months as and when I gather time.

    Until next time,
    MS.    

    by Meet Pragnesh Shah (noreply@blogger.com) at August 17, 2016 02:19 AM

    John Detlefs
    (MDAnalysis)

    Visualizing Data with t-SNE, part 2

    Soldiering on!

    Yesterday I covered the barebone essentials behind the SNE algorithm. (We haven’t actually arrived at t-SNE yet.)

    Before I move on, there were some bullet points that I didn’t cover:

    1. What does it mean to induce a probability distribution?
    2. What is Shannon entropy?
    3. What is a smooth measure?
    4. What is the crowding problem?

    The answer to the first question is pretty simple. Given some $\sigma_i$, this is established a variance for our data. This establishes a set probability for every other point in the set, when every point is accounted for in a discrete set, we have a probability distribution $P_i$.

    So I come from a physical chemistry background, and if you’re like me, when you think entropy, you think the Second Law and Third Law of Thermodynamics. Informally, entropy quantifies the order of a molecular system. A perfectly arranged crystal has entropy equal to 0, while the plasma swirling around in the universe has something much higher.

    From wikipedia:

    Shannon Entropy is a measure of unpredictability of information content.

    English text, treated as a string of characters, has fairly low entropy, i.e., is fairly predictable. Even if we do not know exactly what is going to come next, we can be fairly certain that, for example, there will be many more e’s than z’s, that the combination ‘qu’ will be much more common than any other combination with a ‘q’ in it, and that the combination ‘th’ will be more common than ‘z’, ‘q’, or ‘qu’. After the first few letters one can often guess the rest of the word.

    The Shannon Entropy of our probability distribution $P_i$ over the discrete set of data is defined as the expectation value of the Information Content of X. The information content is always a logarithm of a probability, for Shannon Entropy, this is log base 2. This is considered the entropy measured in base 2 bits.

    There might be some more formal definition somewhere, but when I think expectation value of some random variable, in this case the Information content of X, I just think multiply at each point by the probability at that point. Summing over all points in a discrete set, the Shannon Entropy $H(P_i)$ is:

    The Perplexity is simply 2 raised to this number, and is considered a ‘smooth measure of the effective number of neighbors’.

    What is a smooth measure? Well, this is a whole can of worms. If you’re a measure theorist I have the utmost respect for you. In my limited exposure from some brief introductions from kind Math professors and Wikipedia, a measure allows you to ascribe and compare sizes of subsets in a set. I think you need to take a few measure theory classes, read some books, maybe even get a Ph.D to have a strong understanding of ‘smoothness’.

    Last stop on the SNE train before we get onto part 3 (we’re almost at t-SNE I promise!)

    The crowding problem!

    Before I start, I have to define what it means for points to be mutually equidistant. Consider the two dimensional plane, and an equilateral triangle in the plane. For each vertex, the distance to the other two vertices is the same. A square can never have this property, a corner vertex will always be further away from an opposite corner than it will be from an adjacent corner.

    It turns out that this a property that extends to n-dimensional Euclidean spaces. Objects with equidistance vertices can be constructed when the number of vertices is less than or equal to n+1.

    When reducing the dimensionality of a dataset, this fact is problematic. There are more ways for things to be close in high-dimensions than there are in lower dimensions. When a bunch of data points are close together in some high-dimensional manifold, the area available for points to spread out on is much higher than in any 2D representation. Close points will occupy too much space in 2D representations. (I really wish I had an iPad so I could easily sketch some examples of what I’m talking about, but I think this is supremely cool.)

    The notion of a set of springs finds a great use here; crowding will cause non-neighbors to be positioned in spots that are way too far away. This ruins the gradient descent, the ‘spring forces’ in the gradient are proportional to the distance, and the large number of distance points accumulate, pulling all the data into one ugly cluster. (At least that’s how I understand it…)

    The next post will cover t-SNE in all its glory, thanks for reading!

    August 17, 2016 12:00 AM

    August 16, 2016

    chrisittner
    (pgmpy)

    PC constraint-based BN learning algorithm

    The past while I have been working on basic constraint-based BN learning. This required a method to perform conditional independence tests on the data set. Surprisingly, such tests for conditional independence are not part of scipy.stats or other statistics libraries.

    To test if X _|_ Y | Zs, one has to manually construct the frequencies one would expect if the variables were conditionally independent, namely \(P(X,Y,Zs)=P(X|Zs)\cdot P(Y|Zs)\cdot P(Zs)\) and compare it with the observed frequencies, using e.g. a \(\chi^2\) deviance statistic (provided by scipy.stats). Expected frequencies can be computed as \(\frac{P(X, Zs)\cdot P(Y, Zs)}{P(Zs)}\), So one can start with a joint state_count/frequency table and marginalize out \(X\), \(Y\), and both and compute the expected distribution from the margins.

    Once such a testing method is in place, the PC algorithm can be used to infer a partially directed acyclic graph (PDAG) structure to capture the dependencies in the data, in polynomial time. Finally, the PDAG can be fully oriented and completed to a Bayesian network. The implementation looks like this:

    Methods of the ConstraintBasedEstimator class:

    test_conditional_independence(self, X, Y, Zs=[])

    Chi-square conditional independence test (PGM book, 18.2.2.3, page 789)

    build_skeleton(nodes, independencies)

    Build undirected graph from independencies/1st part of PC algorithm (PGM book, 3.4.2.1, page 85, like Algorithm 3.3)

    skeleton_to_pdag(skel, seperating_sets)

    Orients compelled edges of skeleton/2nd part of PC (Neapolitan, Learning Bayseian Networks, Section 10.1.2, page 550, Algorithm 10.2)

    pdag_to_dag(pdag)

    Method to (faithfully) orient the remaining edges to obtain BayesianModel (Implemented as described here on page 454 last paragraph (in text)).

    Finally three methods that combine the above parts for convenient access:

    estimate(self, p_value=0.01)

    -> returns BayesianModel estimate for data

    estimate_skeleton(self, p_value=0.01)

    -> returns UndirectedGraph estimate for data

    estimate_from_independencies(nodes, independencies)

    -> static, takes set of independencies and estimates BayesianModel.

    Examples:

    import pandas as pd
    import numpy as np
    from pgmpy.base import DirectedGraph
    from pgmpy.estimators import ConstraintBasedEstimator
    from pgmpy.independencies import Independencies
    
    data = pd.DataFrame(np.random.randint(0, 5, size=(2500, 3)), columns=list('XYZ'))
    data['sum'] = data.sum(axis=1)
    print(data)
    
    # estimate BN structue:
    c = ConstraintBasedEstimator(data)
    model = c.estimate()
    print("Resulting network: ", model.edges())
    

    Output:

          X  Y  Z  sum
    0     1  3  4    8
    1     3  3  0    6
    2     4  4  1    9
    ...  .. .. ..  ...
    2497  0  4  2    6
    2498  0  3  1    4
    2499  2  1  3    6
    
    [2500 rows x 4 columns]
    
    Resulting network: [('Z', 'sum'), ('X', 'sum'), ('Y', 'sum')]
    

    Using parts of the algorithm manually:

    # some (in)dependence tests:
    data = pd.DataFrame(np.random.randint(0, 2, size=(50000, 4)), columns=list('ABCD'))
    data['E'] = data['A'] + data['B'] + data['C']
    c = ConstraintBasedEstimator(data)
    
    print("\n P-value for hypothesis test that A, C are dependent: ",
          c.test_conditional_independence('A', 'C'))
    print("P-value for hypothesis test that A, B are dependent, given D: ",
          c.test_conditional_independence('A', 'B', 'D'))
    print("P-value for hypothesis test that A, B are dependent, given D and E: ",
          c.test_conditional_independence('A', 'B', ['D', 'E']))
    
    # build skeleton from list of independencies:
    ind = Independencies(['B', 'C'], ['A', ['B', 'C'], 'D'])
    ind = ind.closure()
    skel, sep_sets = ConstraintBasedEstimator.build_skeleton("ABCD", ind)
    print("Some skeleton: ", skel.edges())
    
    # build PDAG from skeleton (+ sep_sets):
    data = pd.DataFrame(np.random.randint(0, 4, size=(5000, 3)), columns=list('ABD'))
    data['C'] = data['A'] - data['B']
    data['D'] += data['A']
    c = ConstraintBasedEstimator(data)
    pdag = c.skeleton_to_pdag(*c.estimate_skeleton())
    print("Some PDAG: ", pdag.edges())  # edges: A->C, B->C, A--D (not directed)
    
    # complete PDAG to DAG:
    pdag1 = DirectedGraph([('A', 'B'), ('C', 'B'), ('C', 'D'), ('D', 'C'), ('D', 'A'), ('A', 'D')])
    print("PDAG: ", pdag1.edges())
    dag1 = ConstraintBasedEstimator.pdag_to_dag(pdag1)
    print("DAG:  ", dag1.edges())
    

    Output:

    P-value for hypothesis test that A, C are dependent:  0.995509460079
    P-value for hypothesis test that A, B are dependent, given D:  0.998918522413
    P-value for hypothesis test that A, B are dependent, given D and E:  0.0
    Some skeleton:  [('A', 'D'), ('C', 'D'), ('B', 'D')]
    Some PDAG:  [('A', 'C'), ('A', 'D'), ('D', 'A'), ('B', 'C')]
    PDAG:  [('A', 'D'), ('A', 'B'), ('C', 'D'), ('C', 'B'), ('D', 'A'), ('D', 'C')]
    DAG:   [('A', 'B'), ('C', 'B'), ('D', 'A'), ('D', 'C')]
    

    by Chris Ittner at August 16, 2016 10:00 PM

    TaylorOshan
    (PySAL)

    Wrapping Up

    In the last week of the GSOC work, the focus is on wrapping everything up, which means finalizing code and documentation, providing useful examples for educational use, and reflecting on the entire project to provide a plan on how to continue to grow the project beyond GSOC 2016.

    The finalized code and documentation will be reflected in the project itself, where as educational materials will be in the form of a jupyter notebook that demonstrate various features of the project on a real life dataset (NYC CITI bike share trips). The notebook can be found here. Other experiments and proto-typing notebooks can be found in this directory.

    In order to systematically reflect on the progress made throughout the project, I will now review the primary features that were developed, linking back to pertinent blog posts where possible.

    API Design and The SpInt Framework

    The primary API consists of four user-exposed classes: Gravity, Production, Attraction, and Doubly, which all inherit from a base class called BaseGravity. All of these classes can found in gravity script. The user classes accept the appropriate inputs for each of the four types of gravity-based spatial interaction model: basic gravity model, production-constrained (origin-constrained), attraction-constrained (destination-constrained), and doubly constrained. The BaseGravity class does most of the heavy lifting in terms of preparing the apprpriate design matrix. For now, BaseGravity inherits from CountModel, which is designed to be a flexible generalized linear model class that can accommodate several types of count models (i.e., Poisson, negative binomial, etc.) and several different types of parameter estimation (i.e., iteratively weighted least squares, gradient optimization, etc.). In reality, CountModels only currently supports Poisson GLM's (based on a customized implementation of statsmodels GLM) and iteratively weighted least sqaures estimation, which will be discussed further later in this review. In addition, the user may have a continuous dependent variable, say trade flows in dollars between countries, and therefore might want to use a non-count model, like Gaussian ordianry least sqaures. Hence, it may make more sense in the future to move away from the CountModel, and just have the BaseGravity class do the necessary dispatching to the approriate probability model/estimation techniques.

    Related blog post(s): post one; post two

    Sparse Compatibility

    Because the constrained variety of gravity models (Production, Attraction, Doubly) require either N or 2N categorical variables (fixed effects), where N is the number of locations that flows may move between, a very large sparse design matrix is necessary for any non-trivial dataset. Therefore, a large amunt of effort was put into efficiently building the sparse deisgn matrix, specifically the sparse categorical variables. After much testing and benchmarking, a function was developed that can construct the sparse categorical variable portion of the design matrix relatively quickly. This function is particualr fast if the set of locations, N, is index using integers, though it is still efficient if unqiue locations are labeled using string identifiers. The sparse design matrix allows more efficient model calibration. For example, a spatial system akin to all of the counties in US (~3k locations or ~9 million observations), requires less than a minute to claibrate a production-constrained model (3k fixed effects) on my notebook about two minutes for a doubly-constrained (6k fixed effects) model on my notebook. It was decided to use normal array's for the basic Gravity model since it does not have fixed effects by defualt, has dense matrices, and therefore becomes inefficient as the number of observations grows if sparse matrices are used.

    Related blog post(s): post three; post four; post five

    Testing and Accounting for Overdispersion

    Since the nubmer of pointial origin-destination flow observations grows exponentially as the number of locations increases, and because often there are no trips occuring between many locations, spatial interaction datastes can often be overdispersed. That is, there is more variaiton in the data than would expected given the mean of the observations. Therefore, several well known tests for overdispersion (Cameron & Trivedi, 2012) in the context of count (i.e., Poisson) models were implemented. In addition, a QuasiPoisson family was added that can be activated using the Quasi=True parameterization. If it is decided to accomodate more probability models other than Poisson, as previously discussed, then the Quasi flag would be replaced with a family parameter that could be set to Gaussian, Poisson, or QuasiPoisson. The purpose of the QuasiPoisson formulation is to calibrate a Poisson GLM that allows for the variance to be different than the mean, which is a key assuption in the defualt Poisson model.

    Related blog post(s): post six

    Exploratory Spatial Data Analysis for Spatial Interaction

    To explore the spatial clustering nature of raw spatial interaction data an implementation of Moran's I spatial autocorrelation statistic for vectors was completed. Then several experiemnts were carried out to test the different randomization technqiues that coudl be used for hypothesis testing of the computed statistic. This analytic is good for exploring your data before you calibrate a model to see if there are spatial associations above and beyond what you might expect from otherwise random data. More will be said about the potential to expand this statistic at the end of this review in the 'Unexpected Discoveries' section.

    Related blog post(s): post seven

    Investigating Non-Stationarity in Spatial Interaction

    A method called local() was added to the Gravity, Production, and Attraction classes that allows the models to be calibrated for each of a single location, such that a set of parameter estimates and associated diagnostics is acquired for individual subsets of data. These results can then be mapped, either using python or other conventional GIS software, in order to explore how relationships change over space.

    Related blog post(s): post seven

    Spatial Weights for Spatial Interation

    In order to carry out the vector-based spatial autocorrelation statistics, as well as various types of spatial autoregressive model specifications, it is necessary to define spatial associations between flows using a spatial weight matrix. To this end, three types of spatial weights were implemented, which can be found in the spintW script in the weights mpodule. The first is a origin-destination contiguity-based weight that encodes two flows as a neighbor is they share either an origin or a destination. The second weight is based on a 4-dimensional distance (origin x, origin y, destination x, destination y) where the strength of the association decays with further distances. Finally, network-based weights that use different types of adjacency of flows represented as an abstract or physical network.

    As part of this work I also had the opportuity to contirbute some speed-ups to the DistanceBand class in the Distance script of the weights module so that it avoided a slow loop and could leverage both dense and sparse matrices. In the case of the 4-dimensional diistance-based spatial weight, the resulting spatial weight is not sparse and so the exisiting code could become quite slow. Now it is possible to set the boolean parameter build_sp=False, which will be more effiient when the distance-weighted entries of the weight matrix are increasingly non-zero.

    Related blog post(s): post eight

    Accounting for Spatial Association in Spatial Interaction Models

    It has recently been proposed that due to spatial autocorrelation in spatial interaction data it is necessary to account for the spatial association, otherwsie the estimated parameters could be biased. The solution was a variaiton of the classic spatial autoregressive (i.e., spatia lag) model for flows, which could estimate an additional parameter to capture spatial autocorrelation in the origins, in the destinations and/or in a combination of the origins and destinations (LeSage and Pace, 2008). Unfortunately, no code was released to support this mode specification, so I attempted to implement this new model. I was not able to completely replicate the model, but I was able to extend the existing pysal ML_Lag model to estimate all three autocorrelation parameters, rather than just one. I have also attemped to re-derive the appropriate variance-covariance matrix, though this will take some more work before it is completed. More on this in the 'Moving Forward' section found below.

    Related blog post(s): post nine

    Assessing Model Fit for Spatial Interaction Models

    Several metrics were added for assessing model fit. These include McFadden's pseudo r-squared (based on likelihood ratios), the adjusted McFadden's pseduo r-squared ro account for model complexity, D-squared or percent deviance (based on deviance ratio) and its adjusted counterpart, the standardized root mean square error (SRMSE), and the Sorenseon similarity index (SSI). The D-squared statistics and the pseudo r-squared statistics are properties of the GLM class, while the SRMSE and SSI metrics have been added as properties of the BaseGravity class. However, the functions to compute the SSI and SRMSE are stored in the utils script since they may also be useful for detemrinistic non-gravity type models that could be implemented in the future.

    Related blog post(s): post ten

    Unexpected Discoveries

    While implementing the vector-based spatial autcorrelation statistic, it was noticed that one of the randomization technqiues is not exactly random, depending on the hypothesis that one would like to test. In effect, when you produce many permutations and compare your test statistic to a distriubution of values, you would find that you are always rejecting your statistic. Therefore, there is additional work to be done here to further define different possible hypothesis and the appropriate randomization techniques.

    Leftovers and Moving Forward

    Future work will consist of completing the spatial interaction SAR model specification. It will also include adding in gradient-based optimization of likelihood functions, rather than solely iteratively weighted least squares. This will allow the implementation of other model extensions such as the zero-inflated Poisson model. I was not able to implement these features because I was running short on time and decided to work on the SAR model, which turned out to be more complicated than I originally expected. Finally, future works will also incorporate determinsitic models, spatial effects such as competing destinations and eigenvector spatial filters, and neural network spatial interaction models.

    by Taylor Oshan at August 16, 2016 05:13 PM

    GSoC week 12 roundup

    @cfelton wrote:

    This is the last roundup, as posted in previous GSoC roundups the GSoC program has outlined that all students have their final code committed by 20-Aug. If you have not committed your final code make sure to do so in the next couple days and prepare your final blog post that will be used in your evaluation submission.

    IMPORTANT NOTE TO MENTORS
    All mentors need to provide a summary of the students final evaluation to me (@cfelton) via email by 22-Aug. The assigned mentors were never corrected in the GSoC system, I will need to complete all the final evaluations again. Because of schedule conflicts and PSF requirements I will be completing all the evaluations by the 23rd, please provide the final review as soon as possible.

    ** Student final project submission **
    Students, make sure to have your final blog post, the final post should review what you completed and what is outstanding. I should be able to easily understand what is working in the projects, what is missing, and what doesn't work. This detailed final post is required for a passing evaluation.

    The GSoC work product submission guidelines outline what the final post should have. Take the time required to generate the final blog post that you will link in your submission, it should have:

    1. Description of the work completed.
    2. Any outstanding work if not completed.
    3. Link to the main repository.
    4. Links to the PRs created during the project.

    Review the submission guidelines page in detail.

    The idea of GSoC isn't that students churn out code -- it's important that the code be potentially useful to the hosting Open Source project!

    Also make sure the README on the project repositories is complete, it should give an overview of the project and instructions for a user to get started: install, run tests, the core interfaces, and basic functional description.

    Student week12 summary (last blog, commits, PR):

    jpegenc:
    health 87%, coverage 97%
    @mkatsimpris: 12-Aug, >5, Y
    @Vikram9866: 07-Aug, >5, Y

    riscv:
    health 96%, coverage 51%
    @meetsha1995: 11-Aug, >5, Y
    @srivatsan: 14-Aug, >5, N

    gemac:
    health 93%, coverage 92%
    @ravijain056, 02-Aug, >5, N

    Students and mentors:
    @mkatsimpris, @vikram, @meetshah1995, @Ravi_Jain, @sriramesh4,
    @forumulator,
    @jck, @josyb, @hgomersall, @martin, @guy.eschemann, @eldon.nelson,
    @nikolaos.kavvadias, @tdillon, @cfelton, @andreprado88, @jos.huisken

    Links to the student blogs and repositories:

    Merkourious, @mkatsimpris: gsoc blog, github repo
    Vikram, @Vikram9866: gsoc blog, github repo
    Meet, @meetshah1995, gsoc blog: github repo
    Srivatsan, @srivatsan-ramesh: gsoc blog, github repo
    Ravi @ravijain056: gsoc blog, github repo
    Pranjal, @forumulator: gsoc blog, github repo

    Posts: 10

    Participants: 6

    Read full topic

    by @cfelton Christopher Felton at August 16, 2016 11:04 AM

    Redridge
    (coala)

    coala GSoC 2016: Final Report

    Thanks

    coala GSoC 2016: Final Report

    Before I start on the work summary I would like to thank:

    • Lasse for introducing me to coala and helping me with (almost) any problem I had during this GSoC.
    • Udayan for being a cool guy with good humor (if I say so myself) as well as a good mentor.
    • Mischa for helping me with functional python, decorators and excellent reviewing.
    • Fellow GSoC students that I met at Europython, Adrian, Adhityaa and Tushar for the awesome community bonding.
    • Max because he has dreadlocks. Also the meaningful life teachings and the cooking.
    • Last but not least Google for the sponsorship.

    Work Summary

    I have several pull/merge requests across the different coala repos which I will list here with links so you can check them for yourself.

    Pull/Merge Request Description Status
    coala/2198 Add external_bear_wrap decorator Merged
    coala/2407 Modify the JSON spec used by the decorator Merged
    coala/2452 Migrate some libs to coala-utils Merged
    coala/2460 Bump version for coala-utils Merged
    coala/2583 Add external bear tutorial Merged
    coala-bear-management/3 Extend tool to support external bears Merged
    coala-utils/5 Refactor from coala-decorators to coala-utils Merged
    coala-utils/7 Migrate StringConverter from coala core Merged
    coala-utils/15 Modification for backwards compatibility Merged
    coala-utils/19 Revert changes in yield_once as a fix Merged
    coala-utils/20 Add open_files context manager Merged
    coala-bears/617 Extend tool to support conda packaging Pending

    Goals

    The most important part of the project was to be able to write bears with other languages. I can proudly say that it is now possible to write such an "external" bear.

    Some other achieved goals are:

    • Bear creation tool
    • External bear proof of concept with tutorial

    Some work left to do:

    • Merge packaging tool extension
    • Add Diff handling to external bears

    Now that we have come to an end I can say that the toughest challenge by far was the code merging process since coala has a very strict reviewing workflow.

    Wrap Up

    So that is it for GSoC 2016. It was an awesome experience in which I learnt a lot of stuff (not only programming related) and I met a lot of cool people. I would definitely recommend at least trying to join the program. Worst case scenario, you will have contributed to an open source community which I have explained from the very first post of my blog why it is important.

    That's it from me, feel free to pm me about any questions related to the project (and not only) on the coala gitter channel.

    Alex

    by Alexandros Dimos at August 16, 2016 07:41 AM

    udiboy1209
    (kivy)

    One Hell Of A Summer!

    I always wondered how large-scale projects and organisations got formed and got to the stage where they currently are, where there are tens or even hundreds of people maintaining them and constantly contributing to make them better. I wondered how it would feel like to see the project build up from the first line of code! Google Summer of Code allowed me to experience that! It truly has been an overwhelming and exciting summer of code!

    I got to see the very beginning of the maps module, from not just the first line of code but the initial concept, the plan of coding it and everything. I know it is just a small part of kivent but it has certainly been huge for me compared to the projects I have done before!

    I will try to describe my project from the very beginning in this post.

    Initial Idea

    KivEnt is a game engine and the one thing every game engine needs is a module for displaying tile maps. Tile maps make it simple to design game levels and fields. For example without a map module, a pokemon game developer would have to individually place each grass tile, sand tile and water tile manually, he would have to decide where which cliff edge goes so that the cliffs look elevated in 3D. You are already thinking of ways to automate all this aren’t you? Yeah, just store a 2D array with which tile to render in that position and we can make the 2D array separately. Why don’t we store the array in a file in some standardised format so we could create that file externally. Layering would just require multiple 2D arrays!

    pokemon map created in Tiled

    These are such fundamental requirements of games that there are a lot of tools out there just for creating that external file I mentioned above. A very famous editor is the Tiled Map Editor. Tiled has a fixed format of its files known as TMX and so to display the map created in Tiled on KivEnt, we need module to take all that data from the TMX and render it correctly on the KivEnt canvas. That is essentially what the map module does. But just displaying those tiles and forgetting about them isn’t done. We need to be able to access and modify every tile hence the module also has a good api to access the data.

    Fundamental Requirements

    KivEnt runs on the entity-component architecture which in the simplest sense means that each object in the game is an entity which has data components for each system that controls its properties. So each tile of the map has to be an entity and have some component which relates it to a map. So we have the MapSystem which controls the position of each tile on the map (row, col). First requirement hence was to setup such a system and the corresponding component. Also each tile could have additional properties like animation and hence be a part of other Systems.

    Next, we need a way to efficiently store all the data about which place on the map has which tile. Contiguous allocation is the best way because the tile at row m and column n is at element m * map width + n in the array! We do such array allocation at the Cython level to have greater control over the memory used. But to access this data in code, we also need to have Python APIs for all this data. I built wrapper API classes for both a single tile and the whole tilemap which was the second requirement.

    • PR #141: This is the PR where I built this first minimum viable prototype of the module.
    • PR #142: Added animation to tiles in this PR.
    • PR #143: Fixed a batching bug in the AnimationSystem which prevented textures from two different sources being added as animation frames together.

    I then rendered this kinda trippy color tiles map. There was no automatic entity creation features so I had to set the tile in each place randomly but its a good test case.

    First working prototype

    Loading TMX files

    Now that we have a setup which can store tile map data and easily render it as entities in the game, we just need to get the data from the TMX to the TileMap object. For that we first need something which could parse the XML in TMX and get data. I decided to use python-tmx which is simple and pretty lightweight. It wraps up the XML data to python data objects. There were some feilds which were not being read by the module because it wasn’t up to date with the latest TMX format. So I submitted patches to it for implementing the missing bits.

    • Patch #1: For loading animation frames correctly.
    • Patch #2: For hexsidelength parameter for hexagonal tiles.

    Basic TMX parser

    There are a lot of steps involved in loading TMX files. For a basic idea, each tile would require a texture, a model and possibly an animation to display it. This previous blog post covers the details of the parser.

    • PR #149: Basic TMX parser implementation for orthogonal tiles.

    Next step was to support other tile formats like hexagonal and isometric

    Hexagonal and Isometric tiles

    Hexagonal tiles have a form of arrangement called staggered arrangement. We can have the same kind of arrangement for isometric to get an isometric staggered map.

    This is the way tiles in a staggered arrangement would look like.

    o   o   o   o
    o o o o o o o o
    o o o o o o o o
    o o o o o o o o
      o   o   o   o
    

    The above arrangement has stagger index as even and stagger axis as x axis. This just means that every even indexed tile along the x-axis will be shifted along the y-axis by 1.

    If the stagger index was odd and the stagger axis was y axis, this would be the outcome.

      o o o o o o o
    o o o o o o o 
      o o o o o o o
    o o o o o o o 
      o o o o o o o
    o o o o o o o 
    

    Now consider if these tiles were hexagonal or isometric in shape, with the correct spacing and positioning we could create a map using staggered arrangement. All we have to add to the code is how to get position of the tile from (i,j). Silimar lgic applies for isometric arrangement.

    Here are some examples:

    hexagonal

    isometric

    staggered

    PR #153 is where I added this feature. Figuring out the formula for the position from (i,j) was really interesting! I will probably describe it in another blog post.

    Shapes and other Objects

    Maps may have something other than just tiles drawn on them, like circles, polygons and images. Tiled has a way to store all that data, and so our module has to be able to display it. For drawing shapes we use KivEnt’s VertexModel.

    VertexModel is a set of vertices and a list of indices which indicate what triangles to draw to create the shape required. So essentially all polygons can be represented as multiple adjoint traingles using these vertices and indices. We can also display ellipses using a high number of triangles. So there is a util function which takes the vertex data of the shapes in TMX and converts them to suitable data for the VertexModel. To display the shape all we have to do is create an entity with this model as its render data. Images are trivial too because they will be rendered separately as any other image with a rectangular model and texture.

    This is how the objects look on screen:

    objects

    I added this feature in PR #154.

    The End

    This was the entire work I did as part of GSoC, ending it with PR #156 for documentation and a bit of code cleanup. In the community bonding period I had worked on the AnimationSystem in PR #131 which was required in the maps module.

    This has been really the most exciting project I have ever done! I thank Google Summer of Code, Kivy, and Jacob Kovak, my mentor, for giving me this great oppurtunity and helping me complete it!

    Check out the tutorial to use the maps module with KivEnt here

    August 16, 2016 12:00 AM

    John Detlefs
    (MDAnalysis)

    Visualizing Data with t-SNE, part 1

    Watching Katie Ledecky beat the field by 11 seconds in the 800M swim at the Olympics has led me to the conclusion that I am out of shape. I will never physically be where Katie Ledecky is, but I can use her work ethic as an inspiration to try a little harder academically. Using the Morning Paper as an example, I am going to try to Ledeckify my life a little bit and work hard at becoming more (intellectually) fit every day.

    Today’s paper is ‘Visualizing Data with t-SNE’, and given my lack of expertise, I have decided to split this paper into posts. This post will cover:

    Stochastic Neighbor Embedding (SNE)

    Which requires us to talk about:

    • Data Visualization
    • Various Probability Distributions
    • The Crowding Problem
    • Gradient Descent
    • Shannon Entropy
    • Kullback-Leibler Divergence

    As a refresher, the problem being addressed here is extracting structure from a set of data. Van der Maaten refers to the dimension-reduced data as a Map $Y$, while the dimension reduction applied to a particular point in the dataset is referred to as a map point $y_i$. The number of features in a map point can be be very high, while its intrinsic dimensionality can be much lower.

    Van der Maaten brings up a typical handwritten digit dataset; a set of size-normalized (m-by-n pixel) images will be a set of points in a $R^{mn}$ dimensional vector space. The intrinsic dimensionality of this data should be 10, reflecting the digits 0 through 9. Extracting this structure is hard for computational algorithms and easy for humans.

    For the purpose of visualizing data, it really only makes sense to reduce data to 2 or 3 dimensions. (If you’re a contrarian you’re thinking about color as an extra dimension, tesseracts and Trafalmadorians…) In this 2D or 3D representation, we would like points that are similar to be cluster. Using a k-nearest neighbor classifier on the dimension-reduced data should ideally accurately predict the what a new digit actually is.

    How does Stochastic Neighbor Embedding try to solve this problem?


    Stochastic Neighbor Embedding (SNE) starts by converting the high-dimensional Euclidean distances between datapoints into conditional probabilities that represent similarities.

    This conditional probability is calculated via the standard way:

    In this case:

    A crucial part is finding the appropriate variance $\sigma_i$ to describe the data.

    For the low dimensional counterparts $y_i$ and $y_j$ of the high-dimensional datapoints $x_i$ and $x_j$ it is possible to compute a similar probability.

    Because $\sigma_i$ is just some number, we can set it to $\frac{1}{\sqrt{2}}$ and things simplify to:

    SNE finds a positioning of points in the low dimensional representation Map that attempts to minimize the difference between $q$ and $p$. It does this by minimizing the sum of the Kullback-Leibler divergences using a gradient descent method.

    What is Kullback-Leibler divergence you ask? This is the expectation of the logarithmic distance between two probabilities. Using this as a cost function is good because it punishes Map that misrepresents points that are close in $X$ as distant in $Y$, but doesn’t do much for points that are close in $Y$ but not so in $X$.

    If you’re confused reading the paper, good, because so am I… Van der Maaten’s (VDM) explanation explains the algorithm out of order in a way that is very confusing to me. The order of the algorithm is:

    1. Use a binary search to find the appropriate $\sigma_i$ such that the probability distribution is equal to a Perplexity specified by the user

    2. Use a stochastic gradient descent to organize the map $Y$.

    3. Repeat and choose the optimal result

    I can’t speak for you, but the explanation in this paper doesn’t do a great job at explaining the algorithm. I had to search out the original paper for a linear explanation. In the paper the formula for $p_{i \lvert j}$ is given and $\sigma_i$ is defined. VDM says “The method for determining the value of $\sigma_i$ is presented later in the section.” I feel like if I was writing this paper I’d want to make it explicitly clear that this step is performed prior to the steps that are mentioned following that quote, but this fact is omitted and expected to be understood by the reader. (Not the way my mind works…)

    In this paper, VDM states that the gradient descent minimization samples across ‘reasonable values of the variance of the Gaussian in the high dimensional space $\sigma_i$’, tomorrow I am going to go in and try grok some SNE code to get a better idea of how this is done in real life.

    The heart of this algorithm is explained by the gradient, and VDM presents this sickeningly interesting way about thinking about the gradient:

    Physically, the gradient may be interpreted as the resultant force created by a set of springs between the map point $y_i$ and other map points $y_j$. All springs exert a force along the direction $(y_i -y_j)$. The spring between $y_i$ and $y_j$ repels or attracts the map points depending on whether the distance between the two in the map is too small or too large to represent the similarities between the two high-dimensional data points. The force exerted by the spring between $y_i$ and $y_j$ is proportional to its length, and also proportional to its stiffness, which is the mismatch $(p_{j \lvert i} −q_{j \lvert i} + p_{i \lvert j} −q_{i \lvert j})$ between the pairwise similarities of the data points and the map points.

    This gradient descent is not a convex function, each time we reach a minimum, we don’t know its global. In addition, in order to ensure that the same crappy local minima are’t reached by the descent, a momentum term and perturbative Gaussian noise is added to the gradient. The paper continues to be annoying vague in this algorithmic description, (it isn’t really an SNE paper after all, I shouldn’t get too mad…) The gradient descent initializes a set of map points sampled from an isotropic Gaussian with a small variance centered around the origin, this variance has nothing to do with the variance given.

    Thats it for today… it was a bit of a struggle, and I didn’t even cover all the points I wanted to, but alas, I’m tired. In the next posts, I will address some other parts of SNE, cover my efforts to understand some stuff with linear programming RMSD, and how to make cythonized code into a working Python module.

    Thoughts, interesting quotes:

    “A protein has a lot more ways to be close to another protein than a line has ways to be close to another line”

    “A point on a two dimensional manifold has 3 mutually equidistant points, a point on an m-dimensional manifold has m+1 equidistant points”

    August 16, 2016 12:00 AM

    August 15, 2016

    Kuldeep Singh
    (kivy)

    Before End-Term Evaluation

    The GSoC is about to end and I am very excited😀. Everything went well except for 1-2 in between after my last blog as I got occupied by some placement work at my college The LNM Institute of Information Technology (awesome place).

    Ok, so this week I have been working on documenting my PR’s and making everything mergable. There are some pull requests which I don’t think would be merged soon, so I will work on them even after GSoC.

    It was fun and an awesome experience of learning and interacting with kivy community.

     


    by kiok46blog at August 15, 2016 04:59 PM

    kaichogami
    (mne-python)

    GSoC Final Report

    As per the requirement of GSoC, this article consists of a brief description of my work, what was done, what could not be done, what is left to do, what are future plans of the project, links to work that was merged, links to patches that was not.
    My project involved refactoring decoding modules to comply with scikit-learn. Each heading contains a link to their respecitve pull requests. All commits are referred at the end of article. I have created a jupyter notebook detailing my changes.

    Project Work

    Xdawn Refactoring

    Patch link

    I started my work with refactoring the xdawn module. The xdawn now works with numpy arrays, takes less parameters  in __init__ and therefore is faster than before. The lightweight implementation also enables to pipeline it with other preprocessing steps. I wrote a simple script to compare the time with the original xdawn and the refactored xdawn named XdawnTransformer with data matrix of shape (288, 59, 61).

    Time taken to initialize and fit Xdawn
    0.0634717941284
    time taken to initialize and fit Xdawntransformer
    0.0294799804688
    Time taken to transform Xdawn
    0.0195329189301
    Time taken to transform Xdawntransformer
    0.00475907325745
    
    

    There was around 53% and 75% decrease in time, in running fit and transform respectively.

    Unsupervised Spatial Filter

    Patch link

    Initially written by my mentor, this class uses scikit-learn decomposition algorithms(mainly variations of ICA and PCA) and applies them on epochs data. eeg/meg features dimensions are high which increases the complexity and decreases the efficiency of operations. PCA and ICA are common approach to reduce the dimensionality of such matrices.

    Vectorizer

    Patch link

    Scikit-learn API follows a convention of working with 2D array. To make decoding modules compatible, which uses internal MNE functions(functions work on data array of higher dimensions), they need to be reshaped or converted into a 2D array. Vectorizer thus converts higher dimension data into 2D by placing it in the step previous to a scikit-learn transformer or estimator.

    Ongoing Work

    Temporal Filter

    patch link

    Minor refactor to the existing FilterEstimator class which applies zero-phase low pass, high-pass, band-pass, band-stop filter to epochs data. The new class, called TemporalFilter which does not take info parameter and works with sfreq, keeping it as light as possible. Few other internal changes include, changing  the default parameter, using functions from filter.py instead of writing checks.

    Scoring method in SearchLight

    patch link

    SearchLight class, written by Jean, uses the default scoring method used in the estimator as initialized in the constructor. However in some use cases, scoring method needs to be changed. Also while evaluating the score with cross_val_score method of scikit-learn, changing the scorer is convenient. I am currently working in resolving the issue.

    Ideas Discarded

    During the start of the gsoc, the discussion with my mentors was mainly on how the new decoding API should look like. Being a newcomer, I was confused on what really was required. However my doubts were then clarified, thanks to the patience of my mentors.

    VectorizerMixin

    patch link

    Initially we decided to go with the idea of using a mixin class to internally convert all the data to 2D in the output, and again converting the input to 3D as we decided decoding class would only accept and return a 2D array(MNE functions work with 3D data). This class could also be placed in the beginning of the pipeline so that epoch’s data is converted to 2D. However this idea was discarded as this would have involved a lot of refactoring.

    Location of new classes

    patch link

    Initially all the work to be done during gsoc, we decided to keep it in a separate file called gsoc.py, however Alexander was against it.

    Work Left and Future

    The decoding modules needs a lot of work. Jean has organised the work that is done, and what is still left, nicely here . I plan to stick around and work on the remaining modules that still need rework.

    Finally I am extremely grateful to the chance that gsoc provided. I learnt a lot about code refactoring, API designing and writing code beautifully(my first PR and subsequent PRs clearly shows the improvement). I am honored to work with such a good community, being nice to a newcomer and responding to any queries that I had.
    Lastly  I thank my mentors for being extremely helpful and this project is only possible because of them.

    Links to commits

     


    by kaichogami at August 15, 2016 12:37 PM

    Utkarsh
    (pgmpy)

    MCMC: Hamiltonian Monte Carlo and No-U-Turn Sampler

    The random-walk behavior of many Markov Chain Monte Carlo (MCMC) algorithms makes Markov chain convergence to target distribution inefficient, resulting in slow mixing. In this post we look at two MCMC algorithms that propose future states in the Markov Chain using Hamiltonian dynamics rather than a probability distribution. This allows the Markov chain to explore target distribution much more efficiently, resulting in faster convergence.

    Hamiltonian Dynamics

    Before we move our discussion about Hamiltonian Monte Carlo any further, we need to become familiar with the concept of Hamiltonian dynamics. Hamiltonian dynamics are used to describe how objects move throughout a system. Hamiltonian dynamics is defined in terms of object location and its momentum (equivalent to object’s mass times velocity) at some time . For each location of object there is an associated potential energy and with momentum there is associated kinetic energy . The total energy of system is constant and is called as Hamiltonian , defined as the sum of potential energy and kinetic energy:

    The partial derivatives of the Hamiltonian determines how position and momentum change over time , according to Hamiltonian’s equations:

    The above equations operates on a d-dimensional position vector and a d-dimensional momentum vector , for .

    Thus, if we can evaluate and and have a set of initial conditions i.e an initial position and initial momentum at time , then we can predict the location and momentum of object at any future time by simulating dynamics for a time duration .

    Discretizing Hamiltonian’s Equations

    The Hamiltonian’s equations describes an object’s motion in regard to time, which is a continuous variable. For simulating dynamics on a computer, Hamiltonian’s equations must be numerically approximated by discretizing time. This is done by splitting the time interval into small intervals of size .

    Euler’s Method

    The best-known way to approximate the solution to a system of differential equations is Euler’s method. For Hamiltonian’s equations, this method performs the following steps, for each component of position and momentum (indexed by )

    Even better results can be obtained if we use updated value of momentum in later equation

    This method is called as Modified Euler’s method.

    Leapfrog Method

    Unlike Euler’s method where we take full steps for updating position and momentum in leapfrog method we take half steps to update momentum value.

    Leapfrog method yields even better result than Modified Euler Method.

    Example: Simulating Hamiltonian dynamics of a simple pendulum

    Imagine a bob of mass attached to a string of length whose one end is fixed at point . The equilibrium position of the pendulum is at . Now keeping string stretched we move it some distance horizontally say . The corresponding change in potential energy is given by

    ,

    where is change in height and is gravity of earth.

    Using simple trigonometry one can derive relationship between and .

    Kinetic energy of bob can be written in terms of momentum as

    Further, partial derivatives of potential and kinetic energy can be written as:

    and

    Using these equations we can now simulate the dynamics of simple pendulum using leapfrog method in python.

    from __future__ import division
    import matplotlib.pyplot as plt
    import numpy as np
    
    epsilon = 0.025  # Stepsize
    num_steps = 98  # No of steps to simulate dynamics
    m = 1  # Unit mass
    l = 1.5  # length of string
    g = 9.8  # Gravity of earth
    
    def K(p):
        return 0.5* (p**2) / m
    
    def U(x):
        epsilon_h = l * (1 - np.cos(np.arcsin(x/l)))
        return m * g * epsilon_h
    
    def dU(x):
        return (m * g * l * x) / (1.5 * np.sqrt(l**2 - x**2))
    
    x0 = 0.4
    p0 = 0
    plt.ion() ; plt.figure(figsize=(14, 10))
    # Take first half step for momentum
    pStep = p0 - (epsilon / 2) * dU(x0)
    # Take first full step for position
    xStep = x0 + epsilon * pStep
    # Take full steps
    for num_steps in range(num_steps):
        # Update momentum and position
        pStep = pStep - epsilon * dU(xStep)
        xStep = xStep + epsilon * (pStep / m)
        # Display
        plt.subplot(121); plt.cla(); plt.hold(True)
        theta = np.arcsin(xStep / 1.5)
        y_coord = 1.5 * np.cos(theta)
        x = np.linspace(0, xStep, 1000)
        y = np.tan(0.5*np.pi - theta) * x
        plt.plot(0, 0, 'k+', markersize=10)
        plt.plot(x, y, c='black')
        plt.plot(x[-1], y[-1],'bo', markersize=8)
        plt.xlim([-1, 1]); plt.ylim([2, -1]); plt.hold(False)
        plt.title("Simple Pendulum")
        plt.subplot(222); plt.cla(); plt.hold(True)
        potential_energy = U(xStep)
        kinetic_energy = K(pStep)
        plt.bar(0.2, potential_energy, color='r')
        plt.bar(0.2, kinetic_energy, color='k', bottom=potential_energy)
        plt.bar(1.5, kinetic_energy+potential_energy, color='b')
        plt.xlim([0, 2.5]); plt.xticks([0.6, 1.8], ('U+K', 'H'))
        plt.ylim([0, 0.8]); plt.title("Energy"); plt.hold(False)
        plt.subplot(224); plt.cla()
        plt.plot(xStep,pStep,'ko', markersize=8)
        plt.xlim([-1.2, 1.2]); plt.ylim([-1.2, 1.2])
        plt.xlabel('position'); plt.ylabel('momentum')
        plt.title("Phase Space")
        plt.pause(0.005)
    # The last half step for momentum
    pStep = pStep - (epsilon/2) * dU(xStep)
    

    simple pendulum

    The sub-plot in the right upper half of the output demonstrates the trade-off between the potential and kinetic energy described by Hamiltonian dynamics. The red portion of first bar plot represents potential energy and black represents kinetic energy. The second bar plot represents the Hamiltonian. We can see that at the potential energy is zero and kinetic energy is maximum and vice-versa at . The lower right sub-plot shows the phase space showing how momentum and position are varying. We can see that phase space maps out an ellipse without deviating from its path. In case of Euler method the particle doesn’t fully trace a ellipse instead diverges slowly towards infinity (look at here for further detail).

    We can also see that value of Hamiltonian is not constant but is oscillating slightly. This energy drift is due to approximations used to discretize time. One can clearly see that value of position and momentum are not completely random, but takes a deterministic circular kind of trajectory. If we use Leapfrog method to propose future states than we can avoid random-walk behavior which we saw in Metropolis-Hastings algorithm

    Hamiltonian and Probability: Canonical Distributions

    Now having a bit of understanding what is Hamiltonian and how we can simulate Hamiltonian dynamics, we now need to understand how we can use these Hamiltonian dynamics for MCMC. We need to develop some relation between probability distribution and Hamiltonian so that we can use Hamiltonian dynamics to explore the distribution. To relate to target distribution we use a concept from statistical mechanics known as the canonical distribution. For any energy function , defined over a set of variables , we can find corresponding

    , where is normalizing constant called Partition function and is temperature of system. For our use case we will consider .

    Since, the Hamiltonian is an energy function for the joint state of “position”, and “momentum”, , so we can define a joint distribution for them as follows:

    Since , we can write above equation as

    Furthermore we can associate probability distribution with each of the potential and kinetic energy ( with potential energy and , with kinetic energy). Thus, we can write above equation as:

    ,where is new normalizing constant. Since joint distribution factorizes over and , we can conclude that and are independent. Because of this independence we can choose any distribution from which we want to sample the momentum variable. A common choice is to use a zero mean and unit variance Normal distribution (look at previous post). The target distribution of interest from which we actually want to sample from is associated with potential energy.

    Thus, if we can calculate , then we are in business and we can use Hamiltonian dynamics to generate samples.

    Hamiltonian Monte Carlo

    In Hamiltonian Monte Carlo (HMC) we start from an initial state , and then we simulate Hamiltonian dynamics for a short time using the Leapfrog method. We then use the state of the position and momentum variables at the end of the simulation as our proposed states variables . The proposed state is accepted using an update rule analogous to the Metropolis acceptance criterion.

    Lets look at the HMC algorithm:

    Given initial state , stepsize , number of steps , log density function , number of samples to be drawn

    1. set
    2. repeat until

      • set

      • Sample new initial momentum ~

      • Set

      • repeat for steps

        • Set
      • Calculate acceptance probability

      • Draw a random number u ~ Uniform(0, 1)

      • if then

    is a function that runs a single iteration of Leapfrog method.

    In practice sometimes instead of explicitly giving number of steps , we use trajectory length which is product of number of steps , and stepsize .

    Lets use this HMC algorithm and draw samples from the same distribution multivariate distribution we used in previous post.

    , where

    and

    I’m going to use HMC implementation from pgmpy, which I have implemented myself.

    Here is python code for that

    from pgmpy.inference.continuous import HamiltonianMC as HMC, LeapFrog, GradLogPDFGaussian
    from pgmpy.factors import JointGaussianDistribution
    import numpy as np
    import matplotlib.pyplot as plt
    
    np.random.seed(77777)
    # Defining a multivariate distribution model
    mean = np.array([0, 0])
    covariance = np.array([[1, 0.97], [0.97, 1]])
    model = JointGaussianDistribution(['x', 'y'], mean, covariance)
    
    # Creating a HMC sampling instance
    sampler = HMC(model=model, grad_log_pdf=GradLogPDFGaussian, simulate_dynamics=LeapFrog)
    # Drawing samples
    samples = sampler.sample(initial_pos=np.array([7, 0]), num_samples = 1000,
                             trajectory_length=10, stepsize=0.25)
    plt.figure(); plt.hold(True)
    plt.scatter(samples['x'], samples['y'], label='HMC samples', color='k')
    plt.plot(samples['x'][0:100], samples['y'][0:100], 'r-', label='First 100 samples')
    plt.legend(); plt.hold(False)
    plt.show()
    

    HMC_2D_samples

    If one compares these results to what we have seen in previous post for Metropolis-Hastings algorithm it is clear that HMC converges towards target distribution a lot faster than Metropolis-Hastings algorithm. On careful inspection we can also see that graph looks a lot denser than that of Metropolis-Hastings, which mean that our most of the samples are accepted (high acceptance rate).

    Though performance of HMC might seem better but it critically depends on trajectory length and stepsize. Poor choice of these can lead to high rejection rate, or too high computation time. One can see the results himself by changing both of the parameters in above example. For example when I just changed stepize to 0.5 from 0.25 in above example, nearly all samples are rejected. Though stepsize parameter for HMC implementation is optional, I do not suggest one should use it.

    In pgmpy we have implemented an another variant of HMC in which we adapt the stepsize during the course of sampling thus completely eliminates the need of specifying stepsize (but still requires trajectory length to be specified by user). This variant of HMC is known as Hamiltonian Monte Carlo with dual averaging. In pgmpy we have also provided the implementation of Modified Euler method for simulating Hamiltonian dynamics. (By default both algorithms use Leapfrog. It is not recommended to use Modified Euler method, or Euler method because trajectories are not elliptical, thus they show poor performance in comparison to leapfrog method). Here is a code snippet on how we can use HMCda algorithm in pgmpy.

    # Using JointGaussianDistribution from above example
    from pgmpy.inference.continuous import HamiltonianMCda as HMCda, ModifiedEuler
    # Using modified euler instead of
    sampler_da = HMCda(model, GradLogPDFGaussian, simulate_dynamics=ModifiedEuler)
    # num_adapt is number of iteration to run adaptation of stepsize
    samples = sampler_da.sample(initial_pos=np.array([7, 0]), num_adapt=10,num_samples=10, trajectory_length=10)
    print(samples)
    

    Both (HMC and HMCda) of these algorithms requires some hand-tuning from user, which can be time consuming especially for high dimensional complex model. No-U-Turn Sampler (NUTS) is an extension of HMC that eliminates the need to specify the trajectory length but requires user to specify stepsize. With dual averaging algorithm NUTS can run without any hand-tuning at all, and samples generated are at-least as good as finely hand-tuned HMC.

    NUTS, removes the need of parameter number of steps by considering a metric to evaluate whether we have ran Leapfrog algorithm for long enough, that is when running the simulation for more steps would no longer increase the distance between the proposal value of and initial value of

    At high level, NUTS uses the leapfrog method to trace out a path forward and backward in fictitious time, first running forwards or backwards 1 step, the forwards and backwards 2 steps, then forwards or backwards 4 steps etc. This doubling process builds a balanced binary tree whose leaf nodes correspond to position-momentum states. The doubling process is halted when the subtrajectory from the leftmost to the rightmost nodes of any balanced subtree of the overall binary tree starts to double back on itself (i.e., the fictional particle starts to make a “U-Turn”). At this point NUTS stops the simulation and samples from among the set of points computed during the simulation, taking are to preserve detailed balance.

    The API(in pgmpy) for NUTS and NUTS with dual averaging is quite similar to that HMC. Here is a example

    from pgmpy.inference.continuous import (NoUTurnSampler as NUTS, GradLogPDFGaussian,
                                            NoUTurnSamplerDA as NUTSda)
    from pgmpy.factors import JointGaussianDistribution
    import numpy as np
    import matplotlib.pyplot as plt
    # Creating model
    mean = np.array([0, 0, 0])
    covariance = np.array([[6, 0.7, 0.2], [0.7, 3, 0.9], [0.2, 0.9, 1]])
    model = JointGaussianDistribution(['x', 'y', 'z'], mean, covariance)
    # Creating a sampling instance for NUTS
    sampler = NUTS(model=model, grad_log_pdf=GradLogPDFGaussian)
    samples = sampler.sample(initial_pos=np.array([1, 1, 1]), num_samples=1000, stepsize=0.4)
    # Plotting trace of samples
    labels = plt.plot(samples)
    plt.legend(labels, model.variables)
    plt.title("Trace plot of NUTS samples")
    plt.show()
    
    # Creating a sampling instance of NUTSda
    sampler_da = NUTSda(model=model, grad_log_pdf=GradLogPDFGaussian)
    samples = sampler_da.sample(initial_pos=np.array([0, 1, 0]), num_adapt=1000, num_samples=1000)
    # Plotting trace pf samples
    labels = plt.plot(samples)
    plt.legend(labels, model.variables)
    plt.title("Trace plot of NUTSda samples")
    plt.show()
    

    The samples returned by all four algorithms are of two types which is dependent upon installation available. If working environment has a installation of pandas, then it will return a pandas.DataFrame object otherwise it will return a numpy.recarry object. As for now pgmpy has pandas as a strict dependency so samples returned would always be a DataFrame object but in near future we will not have pandas as a strict dependency.

    Apart from sample method all the four implementation have a another method named generate_sample method, whose each iteration yields a sample which is a simple numpy.array object. This method is useful if one wants to work on a single sample at a time. The API for generate_sample method is exactly similar to that of sample method.

    # Using the above sampling instance of NUTSda
    gen_samples = sampler_da.generate_sample(initial_pos=np.array([0, 1, 0]),
                                             num_adapt=10, num_samples=10)
    samples = np.array([sample for sample in gen_samples])
    print(samples)
    

    pgmpy also provides base class structures so that user defined methods can be plugged-in. Lets look at some example on how we can do that. In this example the distribution we are going to sample from is Logistic distribution The probability density of logistic distribution is given by:

    Thus the log of this probability density function (potential energy function) can be written as:

    And the gradient of potential energy :

    import numpy as np
    from pgmpy.factors import ContinuousFactor
    from pgmpy.inference.continuous import NoUTurnSamplerDA as NUTSda, BaseGradLogPDF
    import matplotlib.pyplot as plt
    
    # Creating a Logistic distribution with mu = 5, s = 2
    def logistic_pdf(x):
        power = - (x - 5.0) / 2.0
        return np.exp(power) / (2 * (1 + np.exp(power))**2)
    # Calculating log of logistic pdf
    def log_logistic(x):
        power = - (x - 5.0) / 2.0
        return power - np.log(2.0) - 2 * np.log(1 + np.exp(power))
    # Calculating gradient log of logistic pdf
    def grad_log_logistic(x):
        power = - (x - 5.0) / 2.0
        return - 0.5 - (2 / (1 + np.exp(power))) * np.exp(power) * (-0.5)
    
    # Creating a logistic model
    logistic_model = ContinuousFactor(['x'], logistic_pdf)
    
    # Creating a class using base class for gradient log and log probability density function
    class GradLogLogistic(BaseGradLogPDF):
    
        def __init__(self, variable_assignments, model):
            BaseGradLogPDF.__init__(self, variable_assignments, model)
            self.grad_log, self.log_pdf = self._get_gradient_log_pdf()
    
        def _get_gradient_log_pdf(self):
            return (grad_log_logistic(self.variable_assignments),
                    log_logistic(self.variable_assignments))
    
    # Generating samples using NUTS
    sampler = NUTSda(model=logistic_model, grad_log_pdf=GradLogLogistic)
    samples = sampler.sample(initial_pos=np.array([0.0]), num_adapt=10000,
                             num_samples=10000)
    
    x = np.linspace(-30, 30, 10000)
    y = [logistic_pdf(i) for i in x]
    plt.figure()
    plt.hold(1)
    plt.plot(x, y, label='real logistic pdf')
    plt.hist(samples.values, normed=True, histtype='step', bins=100, label='Samples NUTSda')
    plt.legend()
    plt.hold(0)
    plt.show()
    

    logistics_NUTSda

    Ending Note

    In this blog we see how by avoid random-walk behavior we can explore target distribution efficiently using some powerful algorithms like Hamiltonian Monte Carlo and No-U-Turn Sampler. In my hopefully next blog post I’ll show not so common yet interesting application of MCMC which I came across recently.

    August 15, 2016 12:00 AM

    August 14, 2016

    Adhityaa Chandrasekar
    (coala)

    GSoC '16: Final Report

    GSoC 2016 was one of the best things I've had the opportunity to participate in. I've learned so much, had a lot of fun with the community the whole time, got to work on something that I really like and care about, got the once-in-a-lifetime opportunity to visit Europe, and still get paid in the end. And none of this would have been possible without the support and help from the coala community as a whole. Especially Lasse, who was my mentor for the program, from whom I've learned so, so much. And Abdeali, who introduced me to coala in the first place and help me get settled in the community. It honestly wouldn't have been possible without any of them, and I really mean it. Seriously, thank you :)

    List of commits I've made over the summer

    The last three months have been action packed. Check 'em out for yourself:

    coala-quickstart

    Commit SHA Commit
    b8d8349 Add tests directory for testing
    df99516 py.test: Execute doctests for all modules
    3d01aed Create coala-quickstart executable
    28a33f9 Add coala bear logo with welcome message
    759e445 generation: Add validator to ensure path is valid
    111d984 generation: Identify most used languages
    4ace132 generation: Ask about file globs
    8f7fe23 generation: Identify relevant bears and show help
    839fa19 FileGlobs: Simplify questions
    7c98e48 Settings: Generate sections for each language
    b28e20c Settings: Write to coafile
    69a5d2f Generate coafile with basic settings
    60bee9a Extract files to ignore from .gitignore
    62978ad Change requirements
    36c8486 Enable coverage report
    d78e85e Bears: Change language used in tests
    4a8819e setup.py: Add myself to the list of maintainers
    54f21c6 gitignore: Ignore .egg-info directories
    6a7b63a Bears: Use only important bears for each language

    coala

    Commit SHA Commit
    45bfec9 Processing: Reuse file dicts loaded to memory
    ef287a4 ConsoleInteraction: Sort questions by bear
    7d57784 Caching: Make caching default
    1732813 Processing: Switch log message to debug
    01890c2 CachingUtilitiesTest: Use Section
    868c926 README: Update it
    f79f53e Constants: Add strings to binary answers
    2d7ee93 LICENSE: Remove boilerplate stuff
    da6c3eb Replace listdir with scandir
    ad3ec72 coalaCITest: Remove unused imports
    91c109d Add option to run coala only on changed files
    5a6870c coala: Add class to collect only changed files
    622a3e5 Add caching utilities
    e1b3594 Tagging: Remove Tagging

    coala-utils

    Commit SHA Commit
    27ee83c Update version
    64b0e0b Question: Validate the answer
    1046c29 VERSION: Bump version
    bd1e8fa setup.cfg: Enable coverage report
    79fee96 Question: Use input instead of prompt toolkit
    cfd81c1 coala_utils: Move ContextManagers from coalib
    c5a4526 Add MANIFEST
    f019962 Change VERSION
    9db2898 Add map between file extension to language name
    a52a309 coala_utils: Add Question module

    That's a +2633 / -471 change! I honestly didn't know it'd be that big. Anyway, those were the technical stats. On to the showcase!

    Stuff I worked on

    My primary GSoC proposal: coala-quickstart

    coala-quickstart

    And here's the coafile that's generated:

    Pretty neat stuff, huh? :)

    Anyway, that was my whole project in a nutshell. I worked on other stuff too during the coding period. Here are some of the results:

    Caching in coala

    This is another thing I'm proud of: caching in coala. Remember how you had to lint all your files every time even if you changed just one line? No more. With caching, coala will only collect those files that have changed since the last run. This produces a terrific improvement in speed:

    Trial 1 Trial 2 Trial 3 Average
    Without caching 9.841 9.594 9.516 9.650
    With caching 3.374 3.341 3.358 3.358

    That's almost a 3x improvement in speed!

    Initially, caching was an experimental feature since we didn't want to break stuff! And this can break a lot of stuff. But fortunately, everything went perfectly smoothly and caching was made default.

    README overhaul

    The coala README page got a complete overhaul. I placed a special emphasis on simplicity and the design; and to be honest, I'm quite happy with the outcome.

    Other miscellaneous stuff

    I worked on other tiny things during the coding phase:

    • #2585: This was a small bugfix (to my annoyance, introduced by me). This also led to a performance improvement.
    • #2322: listdir is a new python3.5 feature that is faster than the traditional scandir that is used to get a directory's contents.
    • e1b3594: I removed Tagging with this commit. It was unused.
    • #11, #14: A generic tool to ask the user a question and return the answer in a formatted manner. This is now used in several packages across coala.

    There were other tiny changes, but you can find them in the commit list.

    Conclusion

    It's really been a blast, right from the start to the start to the finish. Thanks to everyone who has helped me in any way. Thanks to Google for sponsoring such an awesome program. Thanks to the PSF for providing coala with an opportunity at GSoC. I honestly can't see how this would have been possible without any of you.

    To everyone else, I really recommend contributing to open-source. It doesn't have to be coala. It doesn't even need to be a big project. Just find a project you like: it can even be a silly project that doesn't do anything useful. The whole point is to get started. GSoC is one way to easily do that. There is such a wide variety of organizations and projects, I'm pretty sure at least one project will be to your liking. And you're always welcome at coala. Just drop by and say hello at our Gitter channel.

    Adhityaa

    August 14, 2016 06:30 PM

    liscju
    (Mercurial)

    Coding Period XI - XII Week

    In this week I was deciding how to propagate authorization information about redirection server from main repo server to clients. From my investigation and from talk with developers on #glyph channel it seems like the best decision is to make sure that redirection server certificate CA is either:
    1) well known CA
    2) is the same CA as it was used to sign certificate for redirection server

    From talk with my mentor we decided that the best I can do right now is to test the solution and prepare the feature to be production ready. The Redirection Feature is having all functionalities we planned to do so this seems reasonable.

    by Piotr Listkiewicz (noreply@blogger.com) at August 14, 2016 06:11 AM

    srivatsan_r
    (MyHDL)

    12 Weeks and Counting….!

    Well, unofficially this is my 14th coding week on GSoC, since I started coding two weeks earlier. It was a very good experience working with MyHDL. It was fun and challenging both during my first project and the second one.

    I learnt a lot of new things about like how an open source project is packaged and distributed, how the project should be structured, why tests are important and how continuous integration tools help.

    I was thinking whether I would have got the same level of knowledge if I would have contributed to some other Open Source organisation, then I realised that it would have not been the case with other organisation. I was having a chat with my friend who was working with an other Sub-org under Python Software Foundation. I asked him how many lines of code did you write during GSoC (Though this may not be the exact measure of the work done, it still gives an approximate measure.), he said “around 500!”. I was like “Just 500?”, because I wrote around 3000 lines for my first project. Just then, I realised that I have contributed a lot for MyHDL and it has given me a lot of learning experience in return.

    Most importantly I should thank my mentors (Mr.Eldon Nelson and Mr.Christopher Felton) they both were very very supportive. Eldon was very motivating and always keeping a check on how my I have completed my project and Chris was helping me with debugging errors and clearing my doubts. I bet I would have not got such wonderful mentors in any other GSoC organisation.

    Coming back to my GSoC update, my college has started(2 weeks before), so the GSoC progress got a little delayed, we are left with the final overall core test. For both my partner and me college has started and we have assignments from the college side to complete. So, the GSoC progress is getting a little delayed.

    In my first project, I have an PR which is not yet merged. I m waiting for my mentor to give me a green signal to merge it.


    by rsrivatsan at August 14, 2016 03:05 AM

    August 13, 2016

    chrisittner
    (pgmpy)

    HillClimbEstimator done

    pgmpy now has a basic hill climb BN structure estimator.

    Usage:

    import pandas as pd
    import numpy as np
    from pgmpy.estimators import HillClimbSearch, BicScore
    
    # create data sample with 9 random variables:
    data = pd.DataFrame(np.random.randint(0, 5, size=(5000, 9)), columns=list('ABCDEFGHI'))
    # add 10th dependent variable
    data['J'] = data['A'] * data['B']
    
    est = HillClimbSearch(data, scoring_method=BicScore(data))
    best_model = est.estimate()
    
    print(sorted(best_model.nodes()))
    print(sorted(best_model.edges()))
    

    Output:

    ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
    [('A', 'J'), ('B', 'J')]
    

    by Chris Ittner at August 13, 2016 10:00 PM

    Preetwinder
    (ScrapingHub)

    GSoC-5

    Hello,
    This post continues my updates for my work on porting frontera to python2/3 dual support.
    Only 3 days remain till the beginning of the final clean-up period. The testing work is entirely finished, although there are some other modules I might add tests for later. The porting work is also almost done. I have a PR which ports the final components(workers, middlewares) which will most probably be merged on Monday. The only remaining modules after that are the messagebus modules. I’ll be making a PR with these components on Monday, and it should be merged in the next day or two. I plan to spend the next week on making changes to the documentation, and performing tests for various deployment configurations to weed out any problems that might have persisted. If things go according to plan, We should be able to make a release with Python 3 support soon after the work period ends(23 August).

    GSoC-5 was originally published by preetwinder at preetwinder on August 13, 2016.

    by preetwinder (you@email.com) at August 13, 2016 02:35 PM

    Adrianzatreanu
    (coala)

    It has come to an end..

    In this blog post I will write a short summary in which I will present my GSoC progress so far and include some proofs to what I have worked on.

    Currently, most of my project’s aims have been achieved. However, there’s still some little work to be done. I loved working on my project and I really had fun, while I learnt a lot of cool stuff!

    If I’d have to approximate, I’d say around 80% of what I proposed myself to do this summer I have achieved.

    Why have I not finished it? Because I wanted to do too much?

    No. This is not the reason. The real reason for not finishing is that I’ve spent a lot of time designing and thinking in advance for the tools and how everything was going to be, because I’ve wanted everything to be well thought and to be actually usable. Instead of going blindly on the first option of uploading, or installing, I took some time and discussed with everyone around coala how this is going to be done and why. This thinking took a lot of time.

    Another reason for still not having everything done is that the review process takes a while. Here, at coala, we do not merge things just to have something partially usable. We review it. Hard. Until every line of code is used at its maximum efficiency and makes sense and is optimized.

    What I have already done with success

    I have fulfilled 90% of the bears with their REQUIREMENTS as a metadata attribute. This REQUIREMENTS attribute is a tuple that contains instances of PackageRequirement classes, classes which contain package names, versions, package managers. I have also created metadata attributes specific to each bear and fulfilled all the bears with them.

    I have created a tool that uploads ALL bears correctly to PyPI, taking data from the bears themselves, including from the metadata attributes I’ve written to them.

    I have created a tool that installs bears while interacting with the user, giving him the options to install ALL bears, SOME bears or None. This tool will also install the REQUIREMENTS, gathered from the attribute existent in most bears, using an installation_method() that each PackageRequirement instance has, specific to that manager.

    I have tied the bears to be discovered by coala using entry points, as coala gathers bears searching for installed PyPI packages that have the ‘coalabears’ entry point.

    What is left to do

    However, with all the work spent, there’s still some things that I’d love to do next!

    Firstly, I’d love to make some cool packages that would be shown to the user using existing bears, such as Web Development bears package which will include JavaScript bears, CSS, HTML.

    Also, I will make some cool improvements and enhancements to the installation tool, some of which I started working on, and some of which will be shown here https://gitlab.com/coala/bear_installation_tool/issues . Some of these enhancements include:

    • Changing the output given by PyPI to a cooler output
    • Showing all bears that failed installation at the end as a list
    • Fix a bug in which coala does not correctly find all bear packages installed

     

    For a full list of my work this summer, these links can be consulted:


    by adrianzatreanu at August 13, 2016 02:29 PM

    kaichogami
    (mne-python)

    Work After Mid-Term

    Hello all!
    I completely forgot about posting about my progress after mid term. I hope I make up for that with this post.
    Till my mid term, I was working on refactoring Xdawn, which was huge and definitely complicated for a beginner like me . However patience and with the huge help from my mentors I made some valid contributions!:)

    Moving forward, I started working on making a class called Vectorizer which makes scikit-learn pipeline compatible with MNE transformers, typically  working with 3D matrix .
    Next I worked on a general class(PR shows closed, but was merged while rebasing) which applies reduction/decomposition algorithms to mne data, using scikit-learn transformers. This work was started by my mentor Jean, on which I improved and extended its functionality.
    I am currently working on a refactoring FilterEstimator which I will mention in the next blog post. Thanks for reading.
    Have a nice day!


    by kaichogami at August 13, 2016 04:57 AM

    tushar-rishav
    (coala)

    Pipes in Linux

    In this blog, I’d like to share about Pipes - an interesting feature of Unix/Linux operating systems (it’s available in other systems too). Having talked about Pipes in brief, I will do the implementation in Python. So let’s get started!

    Pipe

    Brief History

    Pipes are the eldest IPC tools and were introduced by Douglas McIlroy after he noticed that most of the time they were processing the output of one process as the input to another. Later, Ken Thompson added the concept of pipes to the UNIX operating system.

    About

    In simple terms, a pipe is a method of connecting the standard output of one process to the standard input of another. A quick example:

    1
    ls . | sort

    In the above shell command (reading from left to right), we are passing the output from ls command as an input to sort command. The combined output is a current directory’s content in a sorted order. The vertical bar ( | ) is a pipe character. It acts as a method of one-way communications (or half-duplex) between processes.

    They are of basically two types : Anonymous Pipes and Named Pipes (FIFO).

    The one we just saw, was an anonymous pipe or half-duplex pipe.

    Pipe creation

    When a process creates a pipe, the kernel sets up two file descriptors (read and write) for use by the pipe. A pipe initially connects a process to itself and any data traveling through the pipe moves through the kernel. Under Linux, pipes are actually represented internally with a valid inode which resides within the kernel itself, and not within the bounds of any physical file system. Well, you might question, what’s the point of pipe if they connect a process to itself. Are they just going to communicate with itself? Well, the answer is no. Pipes are useful in case when we fork a child process and as we know a child process inherits any open file descriptors from parent, allowing us to setup a multiprocess communication (in this case between child and parent process). As both the processes have access to file descriptors, a pipeline is setup.

    One important thing we should note that, since the pipe resides within the confines of the kernel, any process that is not in the ancestry for the creator of the pipe has no way of addressing it. This is not the case with named pipes (FIFOS), which we will discuss next.

    Named Pipes

    Unlike anonymous pipes, a named pipe exists in the file system. After input-output has been performed by the sharing processes, the pipe still exists in the file system independently of the process, and can be used for communication between some other processes.
    We can create a named pipes either using mkfifo or mknod shell commands (Python has an inbuilt method which we will see during implementations).
    Just like a regular file, we can set file permissions on a named pipe. Checkout mode from man mkfifo.
    A quick example of named pipes that we might have often come across.

    1
    cat <(ls -li)

    Here, the output from ls -li is redirected to a temporary named pipe, which shell creates, names and later deletes. Another fun example is to create a very basic shared terminal. Let’s try it out.
    The idea is to create a named pipe and then use two separate cat process to read/write data from/to the named pipe.

    • Creating a named pipe (from console) :

      1
      2
      3
      mkfifo named_pipe_file_name # create a named pipe
      # or
      mknod named_pipe_file_name p

      If you observer closely the output from ls -l looks like:

      1
      prw-r--r--. 1 tushar tushar 0 Aug 13 05:32 named_pipe_file_name|

    You may have noticed an additional | character is shown next to named_pipe_file_name and the file permission starts with p. This is a Linux clue that the named_pipe_file_name is a pipe.

    • Using it:

      1
      2
      3
      4
      cat <named_pipe_file_name # read
      # Run the next command in a separate terminal instance.
      cat >named_pipe_file_name # write
      # Now type your heart out! :)

      You might notice that after the first command the execution appears to be blocked. This happens because the other end of the pipe is not yet connected, and so the kernel suspends the first process until the second process opens the pipe.

    I hope that was a very simple usage of named pipes and helped you understand named pipes.

    Now (for fun), let’s implement pipes in Python. Since the code is in Python, I need not explain every line from the code. It should be readable. :)
    The basic idea is to create two processes (parent and child) and let parent read the data written by child process.

    Implementation in Python2.7
    • Anonymous pipe

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      import os
      import time
      def child(pipeout):
      while True:
      time.sleep(1)
      os.write(pipeout, "Hello from child!")
      def parent():
      # The two file descriptors.
      pipein, pipeout = os.pipe()
      """
      Note: The pipe() call must be made before a call
      to fork or the descriptors will not be inherited
      by the child.
      """
      if os.fork() == 0:
      child(pipeout)
      else:
      while True:
      msg = os.read(pipein, 128)
      print('Parent {} got {} at {}'.format(os.getpid(),
      msg,
      time.time()))
      parent()
    • Named pipe

      The implementation is almost similar except the fact that instead of descriptors, we have access to named pipe file name and we use it to perform I/O operations.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      import os
      import time
      pipe_name = 'named_pipe_file_name'
      def child():
      # Note the same file name is being used for I/O.
      pipeout = os.open(pipe_name, os.O_WRONLY)
      while True:
      time.sleep(1)
      os.write(pipeout, 'Hello from child!\n')
      def parent():
      pipein = open(pipe_name, 'r')
      while True:
      msg = pipein.readline()[:-1]
      print('Parent {} got {} at {}'.format(os.getpid(),
      msg,
      time.time()))
      if not os.path.exists(pipe_name):
      os.mkfifo(pipe_name) # Creates a file on disk.
      pid = os.fork()
      if pid != 0:
      parent()
      else:
      child()

    Phew! That was fun!

    Cheers!

    August 13, 2016 03:03 AM

    GSoC Experience

    Well, it’s been quite a time since the last blog. The current blog and the following blogs are going to be about my overall experience and the stuff that I have learnt in the past 11 weeks while contributing to coala-analyzer as a Google Summer of Code developer under Python Software Foundation. The list is long hence I won’t contain them in single post. :)

    EuroPython’16 Experience

    Recently, I attended EuroPython conference at Bilbao, Spain where I had a chance to meet a few cool fellow coalaians ( @sils1297, @sims1253, @Udayan12167, @justuswilhelm, @Redridge, @Adrianzatreanu and @hypothesist ) and over thousand Pythonista | Pythoneer who happened to share their love and experience using Python by presenting talks, lightning talks or training session. Sadly, I couldn’t meet Attila Tovt my amazing GSoC mentor.

    Being my first PyCon ever, I was a little nervous but curious about it. Having spent a day with the community made me feel comfortable. Seeing the energy that people shared, I was overwhelmed! In the coming days, mornings started with a Keynote speaker which was then followed by over dozen talks throughout a day on various exciting topics like Descriptors in Python, Effective Code Review, AsyncIO, Algorithmic Trading with Python, Deep Learning with tensorflow, Gilectomy: overcoming the GIL in CPython implementation by Lary Hastings and many more.

    Finally the exploration ended with the workshop which I conducted on Guide to make a real contribution to an open source project for novice. It was a learning experience and definitely memorable for me! Being the first time in Europe, I was excited. People are friendly and the place is truly beautiful! :)

    GSoC’16 Experience

    Well, it has been totally an amazing and learning experience during this summer. I could effectively learn the best practices for a collaborative programmer and (probably) became one! Credits to my mentor - @Uran198 who patiently and solicitously reviewed my PRs. I never really bothered to follow practices like atomic changes , TDD aiming for maximum coverage for good code quality until I started contributing. Honestly, following such practices seemed bloating and sometimes annoying at first but once I got hold of them, they became a habit. I think during the GSoC period, the crucial things that I have learnt are how to write a code that is maintainable (docstrings, effective and atomic commits), testable (efficient code design, writing unittests and impressive coverage) and follows the standards. Having learnt these skills, I look forward to share them with my friends and the community.
    The coming blogs would cover these practices in details. :)

    GSoC’16 status

    11 weeks are over with a week remaining before submissions. coala-html project is ready with this PR. I shall be working on to improve this project even after my GSoC period is over. Apart from coala-html, coala-website is almost ready with minor design stuff remaining. Soon enough I will submit it for review. :)

    Cheers!

    August 13, 2016 02:48 AM

    Sheikh Araf
    (coala)

    [GSoC16] Week 12 update

    It is the final week of GSoC and my project is almost complete. I’m still working on the coafile editor, but I haven’t been able to devote much time to it since my summer vacations got over.

    Nevertheless I’ve made some progress. Instead of implementing the GUI of the editor in one go, I’ll first implement support for coafile. This will mostly include some syntax highlighting and content assist. The next step then would be to use this text editor inside a graphical editor.

    This would definitely require more than one week to implement, so I’ll work on the project after the official deadline too. And ofcourse the idea is to keep working on the plug-in, and maintaining it.

    This is probably my last update on GSoC. It’s been really awesome and I’m thinking of writing another post about my overall experience with GSoC.

    August 13, 2016 01:10 AM

    mr-karan
    (coala)

    coala GSoC 2016 Summary

    This blog post is about my work done during the GSoC coding period (May 23 - Aug 15). Doing GSoC has been one of the most amazing experience in my life. Thanks to PSF and coala for giving me this opportunity to work on an amazing open source project. Also a big thanks to Google for running GSoC and cultivating the culture of contributing to Open Source. I have learnt so many things over a span of short three months that would definitely help me grow as a developer.

    Work Summary

    LInk to Issue Link to PR Description Status
    #1925 #2569 Syntax Highlighting Merged
    #154 #2 coala_bears_create Merged
    - #2 Bear Docs Website Merged
    #31 #443 GoErrCheckBear Merged
    #400 #415 VerilogLintBear Merged
    #573 #581 WriteGoodLintBear Merged
    #588 #589 HappinessLintBear Merged
    #642 #643 package.json bug Merged
    #646 #667 VultureBear Merged
    #2574 #2590 ASCIINEMA_URL attribute Merged
    #658 #658 Add ASCIINEMA urls Merged
    #662 #663 ASCIINEMA urls fix Merged
    #2309 #2310 Warning Message for wrong linter Closed / Not Happening
    #611 #675 RustLintBear Open
    #601 #633 MyPyBear Open
    #629 #632 SpellCheckBear Open
    #596 #602 HTTPoliceLintBear Open

    List Of Commits

    coala

    Commits SHA Shortlog
    4e20e9a ConsoleInteraction: Add syntax highlighting
    14ece91 ConsoleInteraction: Add comment for line number
    b610ae4 coalib/bears/Bear: Add ASCIINEMA_URL attribute

    coala-bears

    Commits SHA Shortlog
    6bd715c bears/python: Add MyPyBear
    1ea2076 bears/python: Add VultureBear
    73374c3 bears: Fix ASCIINEMA_URLS
    0587654 bears: Add ASCIINEMA_URL
    67f9de3 requirements: Update coala version
    80aa542 package.json: Add name and version
    1ba1bfd codecov.yml: Fix it
    9c85fe3 bears/naturallanguage: Add WriteGoodLintBear
    0e057fc bears/js: Add HappinessLintBear
    77279d3 bears/go: Add GoErrCheckBear
    72e5d87 bears/verilog: Add VerilogLintBear

    coala-bear-management

    Commits SHA Shortlog
    516b50c Add files required for PyPi
    28d1064 Add .gitlab-ci.yml
    c252dd7 Add setup.cfg
    a5e4a3d coala_bears_create: Add main application
    8c87bfe Add requirements
    fa8fe8d Add README.rst
    624b247 gitignore: Remove useless entries

    website

    Commits SHA Shortlog
    d5e70d2 Add LangCtrl and language page view
    b7f53c6 Add DetailCtrl and detail page view
    8175882 Add BearCtrl and bear page views
    b113e2a Add main angular app and homepage
    66ee579 Add external stylesheets and images
    68cc4b0 data: Mock JSON output
    e0d47be bower: Add Vendor Dependencies

    Some fun stats with git log

    Repo LOC Added LOC Deleted
    coala 456 429
    coala-bears 1012 66
    website 543 0
    coala-bears-create 400 0

    That’s a +2411/-495 change over the span of three months :smile:

    A coala-bears template generator

    coala-bears-create is a tool which lets you create Bears easily, by asking you questions and filling a standard config file. The files generated can then be quickly used by filling additional details to get your Bear up and running in no time. The CLI is implemented with the help of Python Prompt Toolkit which helped me plug in features likely dropdown list, status bar and prompt method for asking questions to the user.

    The templates are present in scaffolding-templates directory. The user is asked to enter the directory where she wants to create the Bear and after the user finishes answering all questions, the data is taken from scaffolding templates , the user’s answers are filled in the templates and a new folder is created for the user with Bear and BearTest files.

    Advantage/Motivation: The advantage of this little application is that, since previously the task of creation of bear meant you need to type/copy some of the standard variable, like certain imports or linter variables which are shared across all Bears. But using this application you just need to specify the value and you have a Bear ready almost instantly without any extra effort required.

    Imgur Imgur Imgur

    Syntax Highlighting in coala

    Presently coala displays results like this:

    Imgur

    The task of this project was to implement Syntax Highlighting for the affected code which brings in some visual enhancement for the user. This task wasn’t as trivial as we didn’t have the information of the language of the file being analyzed, as well as for the fact that we needed dual highlighting (foreground & background colors) The aim of the task was to highlight the part which is in sourcerange and provide syntax highlighting for the rest of it, since highlighting a text and providing colors on top of it doesn’t work well. I also had to figure out a way where I can still print spaces and tabs in unicode like in the previous version.

    I began to search for a library for syntax highlighting as # 1 rule in software development is to use what’s existing already. I finally found Pygments library to be ideal for this task.

    I used get_lexer_for_filename and fed sourcerange.file to this, to get the appropriate lexer based on the language of the file. If however there’s no suitable lexer found, an exception is raised which I catch and make TextLexer (plain-text lexer) as the default one. I used highlight method from Pygments library to get a str object which are basically ANSI escape sequences wrapped to my string. I used TerminalTrueColorFormatter to print my results in a terminal. There was another formatter TerminalFormatter but after some hair pulling(read debugging) I realized I couldn’t use it for background highlighting.

    Till now, I was able to colorize the results but one important task which was missing was that there were no spaces/tabs markers in the strings. In the old source code, there was a custom function which iterated over the string and if a space or tab is found, it was replaced by the unicode character which seems like a bullet mark in the terminal for space and double right arrows for tabs. I couldn’t do this because the str object I had now was also wrapped with ANSI code escape sequences and I spent almost half a day trying to search a workaround for this. Not satisfied with whatever method I used, I searched around Pygments docs in hope for finding something. And voila, I was delighted to see that there’s a filter VisibleWhiteSpace which can be provided to the formatter. It worked as expected and this also helped me remove an entire function from the original source code, with an elegant one line solution. I also asked the maintainers of the original code and they seemed to be fine with this new change. I also highlighted the result.message using the same method. I changed the tests so that they can work with Pygments highlighting and refactored the code a bit, which reduced the duplication of long highlight methods.

    Imgur Imgur Imgur

    Bear Documentation Website

    The task was to create a website for all coala-bears documentation. I was initially clueless on which stack to choose but after some discussion with @tushar-rishav and @sils1297, I decided to go with the awesome AngularJS because of it’s highly customizable filters. I used coala --show-bears json to get a JSON output of all the details/configs of bears present. I parsed this data and used Materiallize to display them in a list of cards, where the user could Read More about the bear in next page or could see some of the important details in the card-reveal option present.

    You can take a look at the website here.

    This site will be deployed alongside coala-website which is Tushar’s project

    Some coala-bears I made

    I enjoy creating new bears for coala, as I religiously use them alongside my side projects. During my GSoC period, I created some bears. Some of them are completed and merged, while some of them are still open and I will continue to work on them after my GSoC is over.

    A list of them is available on top of the post.

    What I couldn’t do

    I have been able to complete most of my proposal tasks and I had to face some difficulties as the project progressed, but nonetheless most of the crucial work has been completed and merged. However, I couldn’t complete Navigation of Results and Embedded Source Code Linting which was mentioned in my project. I hope to continue working on them after my project is over, as the philosophy of GSoC is to embibe the culture of contributing continuously to Open Source and not be done with it in 3 months.

    The task of Navigation of Results was the idea, where the user could go back and forth but the current architecture or design in which coala presents the results made it very difficult to achieve this task and I couldn’t come up with a good clean approach for this.

    The problem with Embedded Source Code Linting was, there was no good approach for finding out what the language is being currently analyzed. I did open an issue for a task related to this in which the user would be presented with an error message if she tries to use a wrong linter, but the status of the PR was changed to not happening/wont fix. This is because there is no one single accurate approach for getting the language, even MIME data fails most of the time. Due to these reasons and obstacles, these tasks weren’t done, but I am sure we can come up with some better alternatives to achieve these things by setting a more achievable target.

    Other cool stuff I did

    I wanted to add ASCIINEMA urls in the bear documentation website, but there wasn’t one single point where I could access them, as they were all tweeted out. I wrote a script to grab all the tweets and filter out the ones which had asciinema url in them.

    For fun purposes, I also wrote a script which counts how many times you have been mentioned by @coala-analyzer They are all located in Twitter Scripts repository in my branch.

    Credits

    A BIG shout-out to the tools which helped me achieve my tasks

    Also thanks to my mentor ManoranjanP, co mentor Lasse, Mischa, AbdealiJK and rest of amazing coala community for helping me throughout the project and providing amazing and helpful reviews on all my PRs. I plan to stay in touch with coala community after GSoC ends, so that we can kick some ass again.

    Happy Coding!

    August 13, 2016 12:10 AM

    Leland Bybee
    (Statsmodels)

    Significance Tests and Results Class

    Distributed Results Class

    Since the last update I have modified what the DistributedModel returns. Previously, it just returned the parameters but I’ve changed things such that it now returns a results class instance similar to the fit methods for other mdoel classes. I’ve also made it possible for the user to change the result class used. The reasoning for this is that what sort of results we expect can change based on the methods used by the DistributedModel. For instance, since we default to a debiasing approach that uses the elastic_net code we would expect a RegularizedResults instance as the default. This is what is set up but I have also implemented a bare bones DistributedResults class. Currently this doesn’t have much but it does allow for some flexibility if we want to add more down the line. To change the results class you simply give an additional argument to DistributedModel: results_class=<ResultsClass>.

    Significance Testing

    Besides the work on distributed estimation I have also put together a PR for the LASSO significance testing that was mentioned as part of the GSoC proposal. It is fairly straight forward. Currently, all I’ve added is the covariance test specified in http://statweb.stanford.edu/~tibs/ftp/covtest.pdf. This is build somewhat in parallel to the other contrasts used in statsmodels but given that the set up is somewhat different I think this is important for now. I think I’d like to eventually go back and unify things somewhat but I think that is beyond the scope of the GSoC work and may be something that carries over into the fall. At this point I just need to add the tests for this and I think it will be ready to go.

    August 13, 2016 12:00 AM

    August 12, 2016

    TaylorOshan
    (PySAL)

    Assessing Model Fit in Spatial Interaction Models

    One issue with spatial interaction models is how to compare the fit of different models. This can be especially tricky in the realm of generalized linear models, where the R-squared value does not have the same interpretation as in an OLS regression. Even in the context of OLS regression, previous work suggests the best way to assess a set of models is via a combination of r-squared, standardized root mean sqaure error (SRMSE), and information-based statistics. In the vein, we I have added the SRMSE to the gravity classes. I have also added several proxies to the R-squared value, since we cannot use the standard R-sqaure. Frist, I added a pseudo R-squared (McFadden's variety), which is based on a camparison of a model's full likelihood to its null likelihood. In addition, there is also an adjusted version, which penalizes the measure for model complexity, which has been added to the gravity classes as well. I have also added a D-squared metric, which may be interpretted as the percentage of deviation accounted for by the model. It is essentially a ratio of the model deviance to the null deviance. This measure also has an adjusted version to account for model complexity. The D-squared metric was added to the GLM class, which is also passed to the gravity class. Finally, the Sorensen similarity index (SSI) was added to the gravity class. This index has become popular in the mobility and network science literature so it was added so that it can be used for comparing models but also against other metrics.

    by Taylor Oshan at August 12, 2016 02:17 PM

    Karan_Saxena
    (italian mars society)

    Pre-final post

    Sorry for this (very) late post.

    My code was making progress so I was trying to fit in as many optimisations as possible before putting up this post. Guess this is high time I do it.

    More details in the final blog post :P

    So my code is (finally!!) working. The steps are being tracked and I am now able to get all the coordinates of the feet.

    Only the publishing of the output on Tango is left.


    Onwards and upwards!!

    by Karan Saxena (noreply@blogger.com) at August 12, 2016 01:59 PM

    aleks_
    (Statsmodels)

    Bugs again?!

    Implementing tests for Granger-causality and instantaneous causality for Vector Error Correction Models (VECM) (compare chapters 3.6 and 7.6 in [1]) resulted in hours of bug-searching.

    The search for bugs in my Granger-causality-test started to come to an end, when I realised that my results and those of the reference software JMulTi differed by a constant factor, namely T / (T - K*p - num_of_deterministic_terms). With this finding I could spot the source of the differing results: the covariance matrix of the residuals. While I was using the estimator with (T - K*p - num_of_deterministic_terms) in its denominator, JMulTi seems to use T as the denominator. Again, it took me some time to understand the reason behind this choice ... and found it: As described in chapter 5.2 in [1], when it comes to parameter constraints in Vector AutoRegressive (VAR)-models, it makes sense to use T as the denominator in the calculation of the mentioned covariance matrix estimator. And since there are such constraints under H0 of the Granger-causality-test, it makes sense to use T as the denominator here, too. With this tiny adaption of code (i.e. dividing the matrix by the factor mentioned above) my results suddenly equaled those of JMulTi.

    Also the test for instantaneous causality led to time consuming searching for bugs. However, I didn't really find a bug in my code. Already desperately changing arbitrary values I found out that the corresponding test case passed when I based my test on a VAR(p+1)-model. However, unlike the test for Granger-causality, the test for instantaneous causality should not be based on a VAR(p+1) but rather a VAR(p)-model according to [1]. I am curious whether this is a bug in JMulTi. To find out more I will try this test with other datasets too.

    With that, thanks for reading and let's get back to work ... or have a good lunch first : )

    [1] Lütkepohl, H. (2005): "New Introduction to Multiple Time Series Analysis"

    by yogabonito (noreply@blogger.com) at August 12, 2016 10:14 AM

    mkatsimpris
    (MyHDL)

    Documentation and Coverage Completed

    The coverage, documentation, and synthesis results are in the PR. I am waiting for Chris to review them and tell me what to change.

    by Merkourios Katsimpris (noreply@blogger.com) at August 12, 2016 09:29 AM

    meetshah1995
    (MyHDL)

    Verify -> Validate -> Vscale

    As you probably can guess from the post title, the last few weeks was mostly about verifying and validating the vscale modules.

    I developed unit tests for each module and now the ongoing work is to create unified tests for the entire assembly of modules. With the successful (*fingers crossed*) implementation of these tests, and some awesome documentation,  riscv module will finally come alive to be fully used by the myHDL community :) .


    See you next week,
    MS.

    by Meet Pragnesh Shah (noreply@blogger.com) at August 12, 2016 03:01 AM

    Utkarsh
    (pgmpy)

    Markov Chain Monte Carlo: Metropolis-Hastings Algorithm

    As discussed in my previous post , we can use a Markov chain to sample from some target probability distribution . To do so, it is necessary to design transition operator for the Markov Chain such that the stationary distribution of chain matches the target distribution . The Metropolis-Hastings sampling algorithm allows us to build such Markov chains.

    Detailed Balance

    To understand how Metropolis-Hastings enable us constructs such chains we need to understand reversibility in Markov Chain. In my previous post I briefly described reversibility as,

    if the probability of transition is same as the probability of reverse transition then chain is reversible.

    Mathematically we can write this as:

    This equation is called as detailed balance.

    Now if the transition operator is regular( A Markov chain is regular if if there exists some number such that, for every , the probability of getting from to in exactly steps is greater than 0. ) and it satisfies the detailed balance equation relative to , then is the unique stationary distribution of (For proof refer here).

    Metropolis-Hastings Algorithm

    Let be the desired stationary distribution which matches the target probability distribution . Let be any to states belonging to state-space of Markov chain. Now using detailed balance equation

    which can be re-written as:

    Now, we will separate the transition in two sub-steps ( I’ll explain why in a moment ), the proposal and the acceptance-rejection. The proposal distribution is the probability of proposing a state given , and the acceptance probability is the conditional probability to accept the proposed state . Transition probability can be written as the product of both:

    Using this relation we can re-write the previous equation as:

    Now since lies in , and we want to maximize the acceptance of new proposed state thus we choose acceptance probability as

    Now acceptance probability is a probability associated with an event of accepting new proposed state, so whenever our acceptance probability is we accept the new proposed state. But what about the case when acceptance probability lies in , i.e less than . In such cases we take a random sample from and if acceptance probability is higher than this number we accept new state otherwise reject it. In some place this criterion is called as Metropolis acceptance criterion.

    In nutshell we can write Metropolis-Hastings algorithm as following procedure :

    1. Initialisation: Pick an initial state at random

    2. Randomly pick new proposed state according to

    3. Accept the state according to Metropolis acceptance criterion. If state is accepted set the current state to , otherwise set it to . Yield this current state as sample

    4. Go to step 2 until required number of samples are generated.

    There are few attractive properties of Metropolis-Hastings algorithm which may not be visible in first-sight.

    • First, the use of proposal distribution for sampling. The advantage of using proposal distribution is that it allows us to indirectly sample from the target distribution when it is too complex to directly sample from.

    • Secondly, our target distribution doesn’t need to be normalized. We can use un-normalized target distribution and our sample will be as good as in the case of normalized target distribution. If you carefully look at the calculation of acceptance probability we are using ratio of target distribution, thus normalizing constant cancels out. The calculation of normalizing constant is itself difficult (requires numeric integration).

    Now the reason to split transition probability must be clear because it allows us take advantage of proposal distribution.

    Enough of this theory, let’s now use this algorithm and try to find samples from beta prime distribution

    The probability density function of beta prime distribution is defined as:

    where is a Beta function. We will ignore this normalizing constant.

    Since Beta prime distribution is defined for , we will choose our proposal distribution as exponential distribution

    ,

    where parameter controls the scale of distribution.

    We will define our target distribution such as scale is our previous value of sample

    import numpy as np
    import scipy.stats as ss
    import matplotlib.pyplot as plt
    
    # Defining beta prime function, valued returned is un-normalized probability
    beta_prime = lambda x, a, b: x**(a-1)*(1+x)**(-a-b)
    
    # Defining the transition function Q
    q = lambda x, scale: np.exp(-scale*x)
    
    def mcmc_beta_prime(num_samples, a, b, warm_up):
        np.random.seed(12345)
        samples = []
        x = np.random.exponential(1)  # The inital state x
        for i in range(0, num_samples):
            samples.append(x)
            x_prime = np.random.exponential(1/x)  # The new proposed state x'
            factor = q(x, x_prime)/q(x_prime, x)
    
            # The acceptance probability
            A = min(1, factor * beta_prime(x_prime, a, b) / beta_prime(x, a, b))
    
            # Accepting or rejecting based on Metropolis acceptance criterion
            u = np.random.uniform(0, 1)
            if u < A:
                x = x_prime
            else:
                x = x
        return samples[warm_up:]  # Discards samples from initial warm-up period
    
    # This function plots actual beta prime distribution against sampled
    def plot_beta_prime_and_samples(a, b):
        plt.figure()
        x = np.linspace(0, 100, 10000)
        y = [ss.betaprime.pdf(x_i, a, b) for x_i in x]
        plt.plot(x, y, label='Real distribution: a='+str(a)+',b='+str(b))
        plt.hist(mcmc_beta_prime(100000, a,b, 1000), normed=True, histtype='step',
                 bins=100, label="Simulated MCMC")
        plt.xlim([0, 5])
        plt.ylim([0, 2])
        plt.legend()
        plt.show()
        plt.close()
    
    plot_beta_prime_and_samples(5, 3)
    

    beta_prime_simulation

    As we can see our sampled beta prime values closely resemble the beta prime distribution

    The Metropolis-Hastings algorithm is a Markov chain Monte Carlo algorithm that can be used to draw samples from both discrete and continuous probability distributions of all kinds, as long as we compute a function f that is proportional to the density of target distribution. But one disadvantage of Metropolis-Hastings algorithm is that it has poor convergence rate. Lets look at an example to understand what I meant by “poor convergence”. In this example we will draw samples from 2D Multivariate normal distribution

    Multivariate normal distribution is represented as , where is mean vector, and is covariance matrix.

    Probability density at any point is given by:

    where is is normalizing constant

    Our target distribution will have

    mean

    and covariance

    Our proposal distribution will be a multivariate normal distribution centred at previous state and unit covariance i.e,

    ,

    where is an identity matrix

    import numpy as np
    import scipy.stats as ss
    import matplotlib.pyplot as plt
    
    # Defining target probability
    def p(x):
        sigma = np.array([[1, 0.97], [0.97, 1]])  # Covariance matrix
        return ss.multivariate_normal.pdf(x, cov=sigma)
    
    # Defining proposal distribution
    def q(x_prime, x):
        return ss.multivariate_normal.pdf(x_prime, mean=x)
    
    samples = np.zeros((1000, 2))
    np.random.seed(12345)
    x = np.array([7, 0])
    for i in range(1000):
        samples[i] = x
        x_prime = np.random.multivariate_normal(mean=x, cov=np.eye(2), size=1).flatten()
        acceptance_prob = min(1, (p(x_prime) * q(x, x_prime) )/ (p(x) * q(x_prime, x)))
        u = np.random.uniform(0, 1)
        if u <= acceptance_prob:
            x = x_prime
        else:
            x = x
    plt.figure()
    plt.hold(True)
    plt.scatter(samples[:,0], samples[:,1], label='MCMC samples', color='k')
    plt.plot(samples[0:100, 0], samples[0:100, 1], 'r-', label='First 100 samples')
    plt.legend()
    plt.hold(False)
    plt.show()
    

    convergence_hastings

    In plot we can see that Metropolis-Hastings algorithm takes time to converge towards target distribution (slow mixing). Like Metropolis-Hastings algorithm many MCMC algorithm suffer from this slow mixing. Slow mixing happens because of a number of factors like random-walk nature of the Markov chain, tendency of getting stuck at a particular sample and only sampling from a single region having high probability density. In my next post we will look at some of the more advance MCMC techniques namely Hybrid Monte Carlo (Hamiltonian Monte Carlo / HMC) and No-U-Turn Sampler (NUTS), which enables us to explore target distribution more efficiently.

    In examples of next post I will use my own implementation of HMC and NUTS (which I implemented under pgmpy) and thus will require a latest installation of pgmpy in working env. For installation instruction you can look at here.

    August 12, 2016 12:00 AM

    Markov Chain Monte Carlo: Metropolis-Hastings Algorithm

    As discussed in my previous post , we can use a Markov chain to sample from some target probability distribution . To do so, it is necessary to design transition operator for the Markov Chain such that the stationary distribution of chain matches the target distribution . The Metropolis-Hastings sampling algorithm allows us to build such Markov chains.

    Detailed Balance

    To understand how Metropolis-Hastings enable us constructs such chains we need to understand reversibility in Markov Chain. In my previous post I briefly described reversibility as,

    if the probability of transition is same as the probability of reverse transition then chain is reversible.

    Mathematically we can write this as:

    This equation is called as detailed balance.

    Now if the transition operator is regular( A Markov chain is regular if if there exists some number such that, for every , the probability of getting from to in exactly steps is greater than 0. ) and it satisfies the detailed balance equation relative to , then is the unique stationary distribution of (For proof refer here).

    Metropolis-Hastings Algorithm

    Let be the desired stationary distribution which matches the target probability distribution . Let be any to states belonging to state-space of Markov chain. Now using detailed balance equation

    which can be re-written as:

    Now, we will separate the transition in two sub-steps ( I’ll explain why in a moment ), the proposal and the acceptance-rejection. The proposal distribution is the probability of proposing a state given , and the acceptance probability is the conditional probability to accept the proposed state . Transition probability can be written as the product of both:

    Using this relation we can re-write the previous equation as:

    Now since lies in , and we want to maximize the acceptance of new proposed state thus we choose acceptance probability as

    Now acceptance probability is a probability associated with an event of accepting new proposed state, so whenever our acceptance probability is we accept the new proposed state. But what about the case when acceptance probability lies in , i.e less than . In such cases we take a random sample from and if acceptance probability is higher than this number we accept new state otherwise reject it. In some place this criterion is called as Metropolis acceptance criterion.

    In nutshell we can write Metropolis-Hastings algorithm as following procedure :

    1. Initialisation: Pick an initial state at random

    2. Randomly pick new proposed state according to

    3. Accept the state according to Metropolis acceptance criterion. If state is accepted set the current state to , otherwise set it to . Yield this current state as sample

    4. Go to step 2 until required number of samples are generated.

    There are few attractive properties of Metropolis-Hastings algorithm which may not be visible in first-sight.

    • First, the use of proposal distribution for sampling. The advantage of using proposal distribution is that it allows us to indirectly sample from the target distribution when it is too complex to directly sample from.

    • Secondly, our target distribution doesn’t need to be normalized. We can use un-normalized target distribution and our sample will be as good as in the case of normalized target distribution. If you carefully look at the calculation of acceptance probability we are using ratio of target distribution, thus normalizing constant cancels out. The calculation of normalizing constant is itself difficult (requires numeric integration).

    Now the reason to split transition probability must be clear because it allows us take advantage of proposal distribution.

    Enough of this theory, let’s now use this algorithm and try to find samples from beta prime distribution

    The probability density function of beta prime distribution is defined as:

    where is a Beta function. We will ignore this normalizing constant.

    Since Beta prime distribution is defined for , we will choose our proposal distribution as exponential distribution

    ,

    where parameter controls the scale of distribution.

    We will define our target distribution such as scale is our previous value of sample

    import numpy as np
    import scipy.stats as ss
    import matplotlib.pyplot as plt
    
    # Defining beta prime function, valued returned is un-normalized probability
    beta_prime = lambda x, a, b: x**(a-1)*(1+x)**(-a-b)
    
    # Defining the transition function Q
    q = lambda x, scale: np.exp(-scale*x)
    
    def mcmc_beta_prime(num_samples, a, b, warm_up):
        np.random.seed(12345)
        samples = []
        x = np.random.exponential(1)  # The inital state x
        for i in range(0, num_samples):
            samples.append(x)
            x_prime = np.random.exponential(1/x)  # The new proposed state x'
            factor = q(x, x_prime)/q(x_prime, x)
    
            # The acceptance probability
            A = min(1, factor * beta_prime(x_prime, a, b) / beta_prime(x, a, b))
    
            # Accepting or rejecting based on Metropolis acceptance criterion
            u = np.random.uniform(0, 1)
            if u < A:
                x = x_prime
            else:
                x = x
        return samples[warm_up:]  # Discards samples from initial warm-up period
    
    # This function plots actual beta prime distribution against sampled
    def plot_beta_prime_and_samples(a, b):
        plt.figure()
        x = np.linspace(0, 100, 10000)
        y = [ss.betaprime.pdf(x_i, a, b) for x_i in x]
        plt.plot(x, y, label='Real distribution: a='+str(a)+',b='+str(b))
        plt.hist(mcmc_beta_prime(100000, a,b, 1000), normed=True, histtype='step',
                 bins=100, label="Simulated MCMC")
        plt.xlim([0, 5])
        plt.ylim([0, 2])
        plt.legend()
        plt.show()
        plt.close()
    
    plot_beta_prime_and_samples(5, 3)
    

    beta_prime_simulation

    As we can see our sampled beta prime values closely resemble the beta prime distribution

    The Metropolis-Hastings algorithm is a Markov chain Monte Carlo algorithm that can be used to draw samples from both discrete and continuous probability distributions of all kinds, as long as we compute a function f that is proportional to the density of target distribution. But one disadvantage of Metropolis-Hastings algorithm is that it has poor convergence rate. Lets look at an example to understand what I meant by “poor convergence”. In this example we will draw samples from 2D Multivariate normal distribution

    Multivariate normal distribution is represented as , where is mean vector, and is covariance matrix.

    Probability density at any point is given by:

    where is is normalizing constant

    Our target distribution will have

    mean

    and covariance

    Our proposal distribution will be a multivariate normal distribution centred at previous state and unit covariance i.e,

    ,

    where is an identity matrix

    import numpy as np
    import scipy.stats as ss
    import matplotlib.pyplot as plt
    
    # Defining target probability
    def p(x):
        sigma = np.array([[1, 0.97], [0.97, 1]])  # Covariance matrix
        return ss.multivariate_normal.pdf(x, cov=sigma)
    
    # Defining proposal distribution
    def q(x_prime, x):
        return ss.multivariate_normal.pdf(x_prime, mean=x)
    
    samples = np.zeros((1000, 2))
    np.random.seed(12345)
    x = np.array([7, 0])
    for i in range(1000):
        samples[i] = x
        x_prime = np.random.multivariate_normal(mean=x, cov=np.eye(2), size=1).flatten()
        acceptance_prob = min(1, (p(x_prime) * q(x, x_prime) )/ (p(x) * q(x_prime, x)))
        u = np.random.uniform(0, 1)
        if u <= acceptance_prob:
            x = x_prime
        else:
            x = x
    plt.figure()
    plt.hold(True)
    plt.scatter(samples[:,0], samples[:,1], label='MCMC samples', color='k')
    plt.plot(samples[0:100, 0], samples[0:100, 1], 'r-', label='First 100 samples')
    plt.legend()
    plt.hold(False)
    plt.show()
    

    convergence_hastings

    In plot we can see that Metropolis-Hastings algorithm takes time to converge towards target distribution (slow mixing). Like Metropolis-Hastings algorithm many MCMC algorithm suffer from this slow mixing. Slow mixing happens because of a number of factors like random-walk nature of the Markov chain, tendency of getting stuck at a particular sample and only sampling from a single region having high probability density. In my next post we will look at some of the more advance MCMC techniques namely Hybrid Monte Carlo (Hamiltonian Monte Carlo / HMC) and No-U-Turn Sampler (NUTS), which enables us to explore target distribution more efficiently.

    In examples of next post I will use my own implementation of HMC and NUTS (which I implemented under pgmpy) and thus will require a latest installation of pgmpy in working env. For installation instruction you can look at here.

    August 12, 2016 12:00 AM

    jbm950
    (PyDy)

    GSoC Week 13

    This week I spent a lot of time working on FeatherstonesMethod and its component parts. I started off by moving a bunch of spatial vector functions from another PR I have to the featherstone PR and used some of those functions to calculate the spatial inertia of Body objects. The next thing I worked on was completely rewriting the internals of the joint code. The joints now consist of 4 reference frames and points (one set at each of the bodys mass centers and one set per body at the joint location).

    After this I ran some basic code that used these new features and kept making changes until the code was able to run without producing errors. I used this same method of work with FeatherstonesMethod and now it too is able to run without producing errors. Now that the code runs it was time to make sure that the output is correct which is a lot more involved than the previous step of work. To begin I solved for the spatial inertia by hand and used this calculation to create test code for Body.spatial_inertia. As expected the code initially was completely incorrect but it now passes the test. I have since been working on the tests for the joint code. Since this code is completely new to the sympy repository it takes a lot more planning than the body test did. Also I need to solve the kinematics by hand for the joints so that I have a base for the test code. This is where I am currently located in the process.

    Also this week I addressed review comments on SymbolicSystem and have moved that PR closer to being able to merge. One of the current hang ups is trying to force Sphinx to autodocument the __init__ method. I think the best solution currently is to move the relevant code back to the main docstring for the class and not worry about trying to document the __init__ method.

    While working on rewriting the joint code I came across a bug in frame.py and have created a docstring with a fix to this along with a test to make sure the fix works.

    Lastly I reviewed a PR that adds a docstring to a method that did not yet have a docstring. The PR had some information in it that was incorrect and after some research I was able to make some suggestions for its implementation.

    Future Directions

    Next week is the last full week of GSoC and my main priority is getting the final evaluation information correctly finished so that the work can be processed correctly. My next goal is to make sure SymbolicSystem gets merged into SymPy. This is not entirely in my hands, however, as I will be having to wait for feedback and so while waiting I will be pulling off different parts of FeatherstonesMethod for separate PR’s at the recomendation of my advisor. These separate PR’s I hope to possibly include in my final evaluation.

    PR’s and Issues

    • (Open) [WIP] Added system.py to physics/mechanics PR #11431
    • (Open) [WIP] FeatherstonesMethod PR #11415
    • (Open) Added docstring to jordan_cell method PR #10356

    August 12, 2016 12:00 AM

    August 11, 2016

    Ranveer Aggarwal
    (dipy)

    Window Event Handling for Panel

    Moving the Panel

    The panel that we began working on last week has finally been completed. Here are the major changes:

    1. Every UI element now has a set_center function. This function does exactly what it says, it sets the center of the UI element to where we want.
    2. We now store the relative positions of the elements within a panel. This is so that we can move the panel around and the elements can be re-allotted centers in accordance with the panel’s new center.
    3. Major changes in how sliders work. This is basically to facilitate the movement of individual slider elements when the slider as a whole is moved.

    Using the above, we can now move the panel around.

    Moving the Panel Around

    Aligning the Panel

    Now, the panel can be left-aligned or right-aligned to the window. Left-alignment means that its position with respect to the left window boundary will remain constant. Similarly for right-alignment. This was done using the set_center in the above step and window modification events.

    A Right-Aligned Panel

    What’s next

    GSoC ends in less than a couple of weeks. In this final sprint the following needs to be done:

    • A file dialog needs to be built. For now, we’ll be saving and opening files.
    • Refactoring for existing code.
    • Making PRs into master.

    August 11, 2016 11:47 AM

    Shridhar Mishra
    (italian mars society)

    Final week

    Things done:
    Integration of tango and PyKinect2.
    Can get skeleton data along with RGB and other information like depth.
    skeleton co-ordinates available in numpy format which is compatible with pytango.
    Client side code done.
    IPC between C# and python in place.



    Things to do.
    Fix bug related to in compatible data type for Push_change_event on tango server.
    test the system for transmission.


    by Shridhar Mishra (noreply@blogger.com) at August 11, 2016 10:41 AM

    mr-karan
    (coala)

    GSoC Week 12 Updates

    • The coding period is about to end and this is the final week to clean up all work and prepare documentation. I have completed most of my tasks and also got coala-bears docs website merged in master repo. The website underwent a major overhaul since last week and a lot of changes have been implemented. A few of the changes include
      • Linking to any page works without the need for Get Started button.
      • Improved design of cards with more info present on card-reveal option.
      • Added a language page.
      • Implemented fuzzy search filter
      • A small text added which shows the filter activated
      • Add an option to reset the filter
      • Add ASCIINEMA support
      • Changed color scheme & improved UI a bit

    You can have a look here https://youtu.be/lcZUh-US8TU

    • For the docs, I needed all available bear’s Asciinema URLs which were present on @coala, so I hacked up a real quick script to grab the urls from tweets and created a new ASCIINEMA_URL attribute in coala. Here’s the PR. and for coala-bears:here

    • I have also refactored Syntax Highlighting PR based on reviews I got. There were a few formatting issues and the PR is likely to get approved soon-ish. Hope it makes to coala 0.8 release on time. :smile:

    • I also created a VultureBear which checks for dead code analysis in your Python code. Check it out here:

    • I am also working on RustLintBear which will the first bear in coala to support Rust language.

    • Thanks to Lasse, I am now the maintainer of coala engagement related tasks

    Future Tasks

    I’ll be wrapping up my work this week and will be submitting a document which will have links to all my commits during the GSoC period.

    Happy Coding!

    August 11, 2016 12:10 AM

    August 10, 2016

    mike1808
    (ScrapingHub)

    GSOC 2016 #5: Creating bridges

    Last week I started working on some killer-feature for Splash. It will allow you to write Lua scripts using almost the same Element (Node, HTMLElement) API as in JavaScript plus some additional helpful methods.

    For example, you want to save the screenshot of the image when it will be loaded. Here is the script for it:

    function main(splash) {
        assert(splash:go(splash.args.url))
        assert(splash:wait(1))
        
        local shots = {}
        
        local element = splash:select('#myImage') -- selecting the element by its CSS selector
        element.onload = function(event)          -- ataching the event listener
           event:preventDefault()
           table.insert(shots, element:png())     -- making a screenshot of the element
        end 
        
        return shots
    end
    

    The Element API is still in development and can be changed

    JS <-> PyQt <-> Python <-> Lua

    Let’s see how the communication between JS and Lua is implemented. Imagine that we are going to execute the following Lua code:

    element:click()
    

    Lua

    splash is a table which has metatable and prototype Splash. In Lua it means that splash is an instance of Splash class. click method is wrapped into several Lua functions. After executing those function, we eventually will call the click Python method. This is possible because of [Lupa] runtime for Lua which allows to inject Python methods into Lua code.

    Python

    click is a method of _ExposedElement Python class which contains all the methods and properties which can be accessed in Lua. It binds Python functions with Lua functions.

    Let’s return to our click method. It do the following procedure when it’s called:

    • calls private_node_method passing the "click" string which means that we want to call the click method of our JavaScript DOM element
    • private_node_methodis another method _ExposedElement and it calls the node_method method of self.element object which is an instance of HTMLElement class;
    • HTMLElement is a class which have API for communicating with the JavaScript HTMLElement
    • HTMLElement#node_method calls PyQt method evaluateJavaScript() with the following JS code:
    window[elements_storage][element_id]["click"]()
    
    • Description
      • elements_storage is our elements storage which is a PyQT object; it allows us to save DOM elements for the further access
      • element_id is a unique ID which allows us to identify our element object
      • "click" is a method name which want to call (in this case it is “click”)

    The elements storage is added to the JS window object using the addToJavaScriptWindowObject method of PyQt.

    So, our Python self.element is connected to the JS node using the element_id.

    PyQt

    PyQt allows us to have WebKit runtime environment in our Python application. Using addToJavaScriptWindowObject we can add instances of QObject to the JS window object. Thereby it will allow us to call Python methods in JS.

    JS

    In JS our node can be accessed through window[storage_name][element_id] object.

    This flow was OK for the one direction: from Lua to JS. But what if want to call Lua function from JS? That can happen when we assign an event handler for some event. In our first example we’ve assigned an event handler for the load event.

    JS -> Lua

    Let’s examine this code:

    element.onload = function(event)
       event:preventDefault()
       table.insert(shots, element:png()) 
    end 
    

    We assign an event handler for load event of our element. How it’s working?

    1. When onload property of element is accessed it calls the __newindex metamethod of element.
    2. This metamethod checks whether the requested property has the'on' prefix. If it does, we calls the private method set_event_handler of element.
    3. In its turn set_event_handler calls Python method private_set_event_handler of _ExposedElement passing the event name for which we want to assign a handler, and the reference to handler function itself.
    4. The crazy parts start here. We wrap our Lua function in Lua coroutine which will allow us to execute it when the event will be fired.
    5. We pass that coroutine to set_event_handler method of HTMLElement Python class.
    6. It saves that coroutine and in another storage which is called event handlers storage returns its ID .
    7. Using PyQt evaluateJavaScript() method we execute the following JS code:
    window[elements_storage][element_id].onload = function(event) {
        window[event_handlers_storage].run(
            event_handler_id, 
            window[events_storage].add(event),
        )
    }
    

    You may think: what does window[event_handlers_storage].run do and what is window[events_storage]?

    • window[event_handlers_storage].run
      • it calls our event handlers storage (which was injected in the same way as elements storage) run method;
      • that method, using the specified event_handler_id, calls the saved coroutine;
      • that coroutine will call our Lua function that was assigned to onload property of our element Lua table;
    • window[events_storage]
      • it’s another storage, but now for events;
      • the main reason for it is calling few methods of our event (preventDefault, stopPropagation, etc).

    As you can see, in order to access our DOM element we should go through all the layers until we reach to JS.

    The following days I will finish up writing tests and documentation for the newly created Element object. Also I will try to refactor those classes and methods to make the Lua <-> JS path more simple.

    August 10, 2016 12:00 AM

    August 08, 2016

    Yen
    (scikit-learn)

    Workaround to use fused types class attributes

    In some of my previous blog posts , we’ve seen Cython fused types’ ability to dramatically reduce both memory usage and code duplication.

    However, of coure there are still some deficiencies of Cython fused types. In this blog post, we are gonna see one of the biggest inconvenience of fused types and how to address it using a hacky workaround.

    Fused Types Limitation

    When you enter the official page of Cython fused types, you can easily find the following warning:

    Note Fused types are not currently supported as attributes of extension types. Only variables and function/method arguments can be declared with fused types.

    It means that if you have a class written in Cython such as the following:

    cdef class Printer:
    	cdef float num;
    	
    	def __init__(self):
    		self.num = 0
    		print self.num		
    

    You are not allowed to use fused types to make function printNum become more generalizable. For example, if we change the type of attribute num from int to numeric of above code snippet:

    Note: cython.floating can either refer to float or double.

    from cython cimport floating
    cdef class Printer:
    	cdef floating num;
    
    	def __init__(self):
    		self.num = 0
    		print x		
    

    It will result in error below since fused types can’t be use as extension type attributes:

    Fused types not allowed here.
    

    Intuitive Solution

    Base on my previous experiences with Cython fused types and suggestion from my mentor Joel, it is intuitive to declare the attribute we want it to be fused types as void*, and then further typecast it in every functions it will be accessed.

    To be more concrete, let’s look at the code:

    from cython cimport floating
    cdef class Printer:
    	
    	// We wish num to be fused types, so declared it as void*
    	cdef void *num;
    
    	def __init__(self):
    		cdef float num = float(5)
    		self.num = &num
    		
    		// Typecast it when we want to access its value
    		cdef floating *num = <floating*>self.num
    		print num[0]
    

    However, above code will again result in error due to a unwritten rules of fused types:

    Fused types can only be used in a function when any of its arguments is declared to be that dused types.

    Why this rules exists is because Cython fused types works by generating multiple C functions, each function’s name involve the actual type it refers to. Nonetheless, once the fused types is not involved in the function’s signature, it will cause error since each generated functions will have the same name.

    Workaround

    Base on the above unwritten rules, here’s the final workaround we can adopt:

    from cython cimport floating
    cdef class Printer:
    	
    	// We wish num to be fused types, so declared it as void*
    	cdef void *num;
    	
    	cdef bint is_float;
    	
    	// Used as fused type arguments
    	cdef float float_sample;
    	cdef long long_sample;
    
    	def __init__(self):
    		cdef float num = float(5)
    		self.num = &num
    		
    		if type(num) == float:
    			self.is_float = True
    		else:
    			self.is_float = False
    			
    		if self.is_float:
    			self._print(float_sample)
    		else:
    			self._print(double_sample)
    	
    	// Underlying function
    	def _print(self, floating sample):
    		// Typecast it when we want to access its value
    		cdef floating *num = <floating*>self.num
    		print num[0]
    

    As you can see, we have to also modified the functions that we want to access, keeping the original function signature as a wrapper and then introduce fused types function arguments into its underlying implementation function.

    That’s it, although it looks really hacky but it works! Hope that Cython can add this functionality soon.

    Summary

    Please leave any thought you have after reading, let’s push Cython’s limitation together!

    August 08, 2016 11:52 PM

    SanketDG
    (coala)

    Things that needed fixing.

    This week I am going to talk about the things that I have been working on, mainly on PR #2423. This is the second last week of GSoC and I fixed a lot of quirks that I was facing for the first two weeks.

    The first one deals with opinionated documentation styles. Most projects follow this style of documentation:

    :param x:           blablabla
    :param muchtoolong: blablabla
    

    While this format is used in most projects, it requires a lot of maintainance (and patience). One extra line or a few extra words and you have to literally “re-design” the entire documentation comment. But there’s another life-saving style though.

    :param x:
        blablabla
    :param muchtoolong:
        blablabla
    

    When I was writing the parsing algorithm for extracting documentation metadata, I had completely forgotten about this style, and thus when I took the algorithm for a test drive, it indeed failed. The bug and the solution were both simple. The algorithm expects a space after the metadata symbols, which wouldn’t ideally happen in the second style. Thus, removing the space clearly solves the problem and parses everything correctly. This affects parsing of the first style in a small way, where we now have to account for an extra space.

    In the future: I am hoping to improve this, the current process of searching for strings is not the most effecient way, I am thinking of slowly transitioning to regex for this.

    Another tiny bug that I found was in the documentation extraction algorithms that were already implemented by my mentor @Makman2.

    To talk about this, I need to explain what a documentation marker means in my project. Its basically a 3-element tuple of strings that define how a documentation comment starts, continues and ends.

    So for a python docstring it would look something like ('"""', '', '"""'). For javadoc, it would look like ('/**', ' *', ' */')

    Now the bug was that for documentation comments that were identified with no middle marker i.e. marker[1] = '', it was completely ignoring lines that only contained a \n i.e. an empty line. This would lead to wrong parsing. The solution (for now) was a simple if-statement to insert a newline if it found a empty line.

    Also, I fixed Escaping. Although I am still not sure that the solution is bulletproof and would work for all cases, it’s good enough. Also, turns out that I have been doing the setting extraction wrong, getting the escaped value (and not the unescaped one).

    Also, I removed some code! As a developer, it feels great to remove more and more lines of code that you don’t need. First, I removed a relatively useless exception handling in the parsing algorithm.

    Second, I moved one of the functions that loaded a file and returned its lines as a list. It was being used in the testing classes. It was being used in three separate files, and thus was moved to a new file TestUtils.py, from where it was then used. REFACTOR EVERYTHING!!!

    Lastly, now DocumentationComment requires a DocstyleDefinition object, instead of language and docstyle (which I always thought was redundant). This kind of falls in refactoring, and thus more removing!

    Coming to things that were added, I finalized the design of the assembling functions, with the help of my mentor. So we decided on having two functions. One constructor-like function that would just arrange the documentation text from the parsed documentation metadata. So it wouldn’t be responsible for the markers and indentation. It returns a DocumentationComment object that contains all the required things for the final assembling. This function could sometimes act like a constructor, where it takes parsed metadata and spits out a readymade DocumentationComment object ready for use.

    The final assembling function just assembles the documentation taking into account the markers and indentation. It returns the assembled documentation comment as a string that can be added/updated in files. While developing this, I actually found out that my algorithm for doing this was totally buggy and would not work for a lot of corner cases, so I am in the process of working them out.

    Also, on a side note, I figured out the metadata settings. This is important, because its important to implement some variable functionality as settings, because it imparts freedom to the user to define what they want to parse. Right now, the concept is at its infancy, for example, settings for a conventional python docstring would like:

    param_start = :param\ # here's a space
    param_end = :
    return_sep = :return:
    

    That’s all for this blog post, I guess. I am almost done with the work in the core repo. I can finally start developing some cool bears!

    August 08, 2016 05:30 PM

    Things that needed fixing.

    This week I am going to talk about the things that I have been working on, mainly on PR #2423. This is the second last week of GSoC and I fixed a lot of quirks that I was facing for the first two weeks.

    The first one deals with opinionated documentation styles. Most projects follow this style of documentation:

    :param x:           blablabla
    :param muchtoolong: blablabla
    

    While this format is used in most projects, it requires a lot of maintainance (and patience). One extra line or a few extra words and you have to literally “re-design” the entire documentation comment. But there’s another life-saving style though.

    :param x:
        blablabla
    :param muchtoolong:
        blablabla
    

    When I was writing the parsing algorithm for extracting documentation metadata, I had completely forgotten about this style, and thus when I took the algorithm for a test drive, it indeed failed. The bug and the solution were both simple. The algorithm expects a space after the metadata symbols, which wouldn’t ideally happen in the second style. Thus, removing the space clearly solves the problem and parses everything correctly. This affects parsing of the first style in a small way, where we now have to account for an extra space.

    In the future: I am hoping to improve this, the current process of searching for strings is not the most effecient way, I am thinking of slowly transitioning to regex for this.

    Another tiny bug that I found was in the documentation extraction algorithms that were already implemented by my mentor @Makman2.

    To talk about this, I need to explain what a documentation marker means in my project. Its basically a 3-element tuple of strings that define how a documentation comment starts, continues and ends.

    So for a python docstring it would look something like ('"""', '', '"""'). For javadoc, it would look like ('/**', ' *', ' */')

    Now the bug was that for documentation comments that were identified with no middle marker i.e. marker[1] = '', it was completely ignoring lines that only contained a \n i.e. an empty line. This would lead to wrong parsing. The solution (for now) was a simple if-statement to insert a newline if it found a empty line.

    Also, I fixed Escaping. Although I am still not sure that the solution is bulletproof and would work for all cases, it’s good enough. Also, turns out that I have been doing the setting extraction wrong, getting the escaped value (and not the unescaped one).

    Also, I removed some code! As a developer, it feels great to remove more and more lines of code that you don’t need. First, I removed a relatively useless exception handling in the parsing algorithm.

    Second, I moved one of the functions that loaded a file and returned its lines as a list. It was being used in the testing classes. It was being used in three separate files, and thus was moved to a new file TestUtils.py, from where it was then used. REFACTOR EVERYTHING!!!

    Lastly, now DocumentationComment requires a DocstyleDefinition object, instead of language and docstyle (which I always thought was redundant). This kind of falls in refactoring, and thus more removing!

    Coming to things that were added, I finalized the design of the assembling functions, with the help of my mentor. So we decided on having two functions. One constructor-like function that would just arrange the documentation text from the parsed documentation metadata. So it wouldn’t be responsible for the markers and indentation. It returns a DocumentationComment object that contains all the required things for the final assembling. This function could sometimes act like a constructor, where it takes parsed metadata and spits out a readymade DocumentationComment object ready for use.

    The final assembling function just assembles the documentation taking into account the markers and indentation. It returns the assembled documentation comment as a string that can be added/updated in files. While developing this, I actually found out that my algorithm for doing this was totally buggy and would not work for a lot of corner cases, so I am in the process of working them out.

    Also, on a side note, I figured out the metadata settings. This is important, because its important to implement some variable functionality as settings, because it imparts freedom to the user to define what they want to parse. Right now, the concept is at its infancy, for example, settings for a conventional python docstring would like:

    param_start = :param\ # here's a space
    param_end = :
    return_sep = :return:
    

    That’s all for this blog post, I guess. I am almost done with the work in the core repo. I can finally start developing some cool bears!

    August 08, 2016 05:30 PM

    mkatsimpris
    (MyHDL)

    Documentation

    Today, I start writing the documentation for all of my modules and the complete frontend part. I will use Sphinx. As, the backend is not ready yet, I will try to fill the time by doing this task.

    by Merkourios Katsimpris (noreply@blogger.com) at August 08, 2016 08:37 AM

    Riddhish Bhalodia
    (dipy)

    Brain Extraction Walkthrough!

    The last week and coming few weeks I will working on polishing all three of my PR’s i.e adaptive denoising, local PCA denoising and the robust brain extraction.

    I have already described the tutorials of the local PCA and adaptive denoising in one of the previous blogposts (here), so in this I will focus on explaining the brain extraction tutorials, and then describe what all things are left to do and new exciting directions that are the real output of this google summer of code project.

    Brain  Extraction Walkthrough!

    The brain extraction which we developed takes help from a template data (T1 image with skull and it’s corresponding brain mask). So let us first load the related modules

    We need the affine information as the algorithm for the brain extraction performs image registration as one of it’s major steps (here).

    Screen Shot 2016-08-07 at 1.43.20 PMScreen Shot 2016-08-07 at 1.43.37 PM

    Now we apply the brain_extraction function which takes the input, template data and template mask as inputs, along with their affine information. There are five other parameters which can be given to the function.

    The same_modality takes boolean value true if the input and template are of same modality and false if they are not, when it takes value false the only useful parameters are the patch_radius and threshold, rest are only used when the modalities are same.

    The patch_radius and block_radius are the inputs for block wise local averaging which is used after the registration step in the brain extraction. The parameter value which is set to 1 as defaults governs the weighing, the threshold value governs the eroded boundary coefficient of the extracted mask. For more info on how these parameters works please look at the fast_patch_averaging function in dipy.segment.

    First we look at the input and template with the same modality (both are T1 images)

    Screen Shot 2016-08-07 at 1.43.56 PM.png

    Screen Shot 2016-08-07 at 1.54.59 PM.png

    Now the data we have used for this experiment is the IBSR database which has manually segmented brain masks as well. This is good because we can compare our output of the brain extraction with their manual mask (from above figure we can see that the algorithm does a pretty good job).

    So to compare the two masks we use Jaccard’s Measure as follows

    Screen Shot 2016-08-07 at 1.54.48 PM.png

    For the above image we get the jaccard’s index as 0.8428 which means it’s is very close to the manually extracted mask.

    Now we look at how the brain extraction behaves when we choose two images with different modalities. In this case our template is of T1 modality and the inpu timage is of B0 modality.

    Screen Shot 2016-08-07 at 1.55.34 PMScreen Shot 2016-08-07 at 1.55.46 PM

    This is the whole brain extraction tutorial. To get an idea of how fast does this algorithm works I have put the runtime with the data sizes

    [A] For same_modality = True

    Input T1 volume :  (256, 256, 128)
    Template volume : (193, 229, 193)
    Time taken : 521.42 seconds

    [B] For same_modality = False

    Input B0 volume : (128, 128, 60)
    Template volume : (193, 229, 193)
    Time taken : 43.98 seconds

    This concludes this blog in the coming week I will put up one of my last GSOC blog which will summarize the projects and point to the new directions which have emerged from these 3 months.

    Thank You


    by riddhishbgsoc2016 at August 08, 2016 08:16 AM

    kaichogami
    (mne-python)

    GSoC Summary

    GSoC 2016 is almost coming to an end. I got a very a nice opportunity to connect and work with extremely knowledgeable, experienced and helpful people. For the three months I learnt about brain signals, machine learning, got experience in designing API with collaboration of a community, talked to prominent researchers and developers and lastly
    left at least a tiny bit contribution to a big open source world.
    My project involved making the decoding module of MNE compatible with scikit-learn, mainly enabling them to be pipelined and evaluated with `cross_val_score`. I started coding after discussing the API with Jean and Dennis. My first task involved refactoring Xdawn algorithm. It was done successfully however I took a lot of time which could have been completed sooner. Following that I have implemented two other classes
    one of which was an improvement of Jean’s work.

    There is still a lot of work to do with decoding, which I would follow up after gsoc. Everyone in the community are extremely helpful and forgiving to a new player like myself. All my mentors responded to my queries and guided my
    without any delay. I am especially thankful to Jean, who never showed any hesitation in helping me.


    by kaichogami at August 08, 2016 04:26 AM

    August 07, 2016

    Adhityaa Chandrasekar
    (coala)

    GSoC '16: Update

    Hello again!

    Big advancements and changes for this update.

    I have almost got my whole project merged! It is in the very last stages with one or two tiny changes to make and then it's done!

    There have been a few changes design-wise:

    • The number of questions has been reduced to just one: this is the ultimate quickstart setup. You just need to give the project directory now and the coafile will be automatically generated. No interaction from the user at all!
      Basically, the question asking the user for files to match is now everything by default. And the files to ignore is automatically identified from the gitignore file. Pretty neat huh?

    • No more complicated section globs. Instead of having an unnecessarily long section, we're now generating concise globs that virtually do the same thing.

    • Settings filling: instead of leaving the mandatory settings to be asked for at runtime, we're now prompting the user for the values at coafile generation itself. This is more logical.

    Here's the coafile generated when I ran coala-quickstart on coala-quickstart's project directory:

    [default]
    bears = LineLengthBear, LineCountBear, SpaceConsistencyBear, InvalidLinkBear, KeywordBear, FilenameBear
    files = **.py, **.yml, **.rst, **.c, **.js
    ignore = .git/**, **/build/**, **/htmlcov/**, htmlcov/**, **/src/**
    max_lines_per_file = 1000
    use_spaces = True
    cs_keywords, ci_keywords = 
    
    [python]
    bears = CPDBear, PyCommentedCodeBear, RadonBear, PyUnusedCodeBear, PEP8Bear, PyImportSortBear, PyDocStyleBear, PyLintBear
    files = **.py
    language = python
    
    [yaml]
    bears = YAMLLintBear
    files = **.yml
    
    [restructuredtext]
    bears = reSTLintBear
    files = **.rst
    
    [c]
    bears = GNUIndentBear, ClangASTPrintBear, CPPCheckBear, CSecurityBear, ClangBear, ClangComplexityBear
    files = **.c
    
    [javascript]
    bears = CPDBear, ESLintBear, JSComplexityBear, JSHintBear
    files = **.js
    language = python
    

    I really like this: this was how I envisioned the coafile to look like originally and it's panning out even better.

    I'm now in the last week of my project. I'm expecting the PR to be merged today and then I'll be focussing on the prototype I have for guessing each bear's params. I'll make an update post again next week.

    Till then,
    Adhitya

    August 07, 2016 06:30 PM

    GSoC '16: Final Report

    GSoC 2016 was one of the best things I've had the opportunity to participate in. I've learned so much, had a lot of fun with the community the whole time, got to work on something that I really like and care about, got the once-in-a-lifetime opportunity to visit Europe, and still get paid in the end. And none of this would have been possible without the support and help from the coala community as a whole. Especially Lasse, who was my mentor for the program, from whom I've learned so, so much. And Abdeali, who introduced me to coala in the first place and help me get settled in the community. It honestly wouldn't have been possible without any of them, and I really mean it. Seriously, thank you :)

    List of commits I've made over the summer

    The last three months have been action packed. Check 'em out for yourself:

    coala-quickstart

    Commit SHA Commit
    b8d8349 Add tests directory for testing
    df99516 py.test: Execute doctests for all modules
    3d01aed Create coala-quickstart executable
    28a33f9 Add coala bear logo with welcome message
    759e445 generation: Add validator to ensure path is valid
    111d984 generation: Identify most used languages
    4ace132 generation: Ask about file globs
    8f7fe23 generation: Identify relevant bears and show help
    839fa19 FileGlobs: Simplify questions
    7c98e48 Settings: Generate sections for each language
    b28e20c Settings: Write to coafile
    69a5d2f Generate coafile with basic settings
    60bee9a Extract files to ignore from .gitignore
    62978ad Change requirements
    36c8486 Enable coverage report
    d78e85e Bears: Change language used in tests
    4a8819e setup.py: Add myself to the list of maintainers
    54f21c6 gitignore: Ignore .egg-info directories
    6a7b63a Bears: Use only important bears for each language

    coala

    Commit SHA Commit
    45bfec9 Processing: Reuse file dicts loaded to memory
    ef287a4 ConsoleInteraction: Sort questions by bear
    7d57784 Caching: Make caching default
    1732813 Processing: Switch log message to debug
    01890c2 CachingUtilitiesTest: Use Section
    868c926 README: Update it
    f79f53e Constants: Add strings to binary answers
    2d7ee93 LICENSE: Remove boilerplate stuff
    da6c3eb Replace listdir with scandir
    ad3ec72 coalaCITest: Remove unused imports
    91c109d Add option to run coala only on changed files
    5a6870c coala: Add class to collect only changed files
    622a3e5 Add caching utilities
    e1b3594 Tagging: Remove Tagging

    coala-utils

    Commit SHA Commit
    27ee83c Update version
    64b0e0b Question: Validate the answer
    1046c29 VERSION: Bump version
    bd1e8fa setup.cfg: Enable coverage report
    79fee96 Question: Use input instead of prompt toolkit
    cfd81c1 coala_utils: Move ContextManagers from coalib
    c5a4526 Add MANIFEST
    f019962 Change VERSION
    9db2898 Add map between file extension to language name
    a52a309 coala_utils: Add Question module

    That's a +2633 / -471 change! I honestly didn't know it'd be that big. Anyway, those were the technical stats. On to the showcase!

    Stuff I worked on

    My primary GSoC proposal: coala-quickstart

    coala-quickstart

    And here's the coafile that's generated:

    Pretty neat stuff, huh? :)

    Anyway, that was my whole project in a nutshell. I worked on other stuff too during the coding period. Here are some of the results:

    Caching in coala

    This is another thing I'm proud of: caching in coala. Remember how you had to lint all your files every time even if you changed just one line? No more. With caching, coala will only collect those files that have changed since the last run. This produces a terrific improvement in speed:

    Trial 1 Trial 2 Trial 3 Average
    Without caching 9.841 9.594 9.516 9.650
    With caching 3.374 3.341 3.358 3.358

    That's almost a 3x improvement in speed!

    Initially, caching was an experimental feature since we didn't want to break stuff! And this can break a lot of stuff. But fortunately, everything went perfectly smoothly and caching was made default.

    README overhaul

    The coala README page got a complete overhaul. I placed a special emphasis on simplicity and the design; and to be honest, I'm quite happy with the outcome.

    Other miscellaneous stuff

    I worked on other tiny things during the coding phase:

    • #2585: This was a small bugfix (to my annoyance, introduced by me). This also led to a performance improvement.
    • #2322: listdir is a new python3.5 feature that is faster than the traditional scandir that is used to get a directory's contents.
    • e1b3594: I removed Tagging with this commit. It was unused.
    • #11, #14: A generic tool to ask the user a question and return the answer in a formatted manner. This is now used in several packages across coala.

    There were other tiny changes, but you can find them in the commit list.

    Conclusion

    It's really been a blast, right from the start to the start to the finish. Thanks to everyone who has helped me in any way. Thanks to Google for sponsoring such an awesome program. Thanks to the PSF for providing coala with an opportunity at GSoC. I honestly can't see how this would have been possible without any of you.

    To everyone else, I really recommend contributing to open-source. It doesn't have to be coala. It doesn't even need to be a big project. Just find a project you like: it can even be a silly project that doesn't do anything useful. The whole point is to get started. GSoC is one way to easily do that. There is such a wide variety of organizations and projects, I'm pretty sure at least one project will be to your liking. And you're always welcome at coala. Just drop by and say hello at our Gitter channel.

    Adhityaa

    August 07, 2016 06:30 PM

    mkatsimpris
    (MyHDL)

    Week 11

    This week I complete the convertible tests for the frontend part and for the new color converter. Vikram, made a PR for the backend part, so the next days we can integrate it with my part and complete the encoder. However, the backend still lacks from complete test coverage with a software prototype. The next days till 15 Augoust which is the end of the coding period I will try to finish the

    by Merkourios Katsimpris (noreply@blogger.com) at August 07, 2016 09:02 AM

    Vikram Raigur
    (MyHDL)

    Huffman Module

    I was stuck a bit while implementing the Huffman Module initially. I was thinking about implementing huffman tables on fly but soon realised it’s a very difficult task.

    Finally I changed my plan to build the tables using the JPEG Standards.

    I added all the huffman encoded values in a csv file and then built rom tables using the csv file.

    The huffman module have a small state machine sitting inside it which makes a unique serial code from the given parallel code which is huffman encoded.

    We generally concat both variable length integer (run length encoded) and variable length code(huffman encoded) and store them in a FIFO.
    In some cases when the input to the run length encoder do not have any zeroes, it will be bad to compress them using huffman encoder because it do not save any space.

    Fianlly, the huffman encoder is merged into the main repo.


    by vikram9866 at August 07, 2016 08:31 AM

    Quantizer module

    The Quantizer module:

    This module uses a divider placed in its core. The Quantizer ROM is build using standard JPEG values.

    Right Now, the quantizer rom has fixed values. In future we plan to implement some process so that Quantizer ROM that can be programmed by the user.

    The Quantizer module is added to the Main repo .


    by vikram9866 at August 07, 2016 08:21 AM

    August 06, 2016

    Avishkar Gupta
    (ScrapingHub)

    Code Review, Optimizations and Formal Benchmarking

    Hi,

    Firstly, sorry for the extremely late blog post this time around, however I was waiting on my mentor’s comments before I gave another status report because I wanted his take on where we are progress wise and the status of the pull request. So without further ado, let’s get into it.

    The majority of the last two week were spent in writing unit tests and cleaning the code wherever possible, removing any outdated constructs and pushing code for review. I also formalized the benchmarking suite using the djangobench code as a starting point as mentioned previously, however there were some fin- ishing touches left the last time which were completed in this code cycle.

    The test coverage of the patch is complete and we have 100% diff coverage, all seems well there.

    Finishing with the benchmarking, I started looking into documentation as due to the refactor a re-write of the Signal API documentation is in order, even though we still do have full backward compatability support. Also, now since the code review comments are in I’ll be looking to work on those issues and get them sorted out at the earliest. Also, we agreed that the benchmarks would not make sense as a part of scrapy bench since that would require that we keep the old dependencies as a part of the project, which would make no sense as they are no longer required. The best solution that we came up with for the same is to write the benchmarks elsewhere however then just include a link to them somewhere in the PR itself so we have a history of the same maintained.

    All in all, we’re happy with how the project has turned out, and we’ll probably be seeing the PR merged into the mainline sometime in the future.

    I’ll update this post as soon as I can think of more stuff I want to write :)

    August 06, 2016 11:00 PM

    tushar-rishav
    (coala)

    Python f-strings

    Hey there! How are you doing? :)

    Since past couple of days I’ve been attending the EuroPython conference at Bilbao, Spain and it has been an increíble experience so far! There are over a dozen amazing talks with something new to share every day and the super fun lightning talks at the end of the day. If for some reason you weren’t able to attend the conference then you may see the talks live at EuroPython YouTube channel.

    In this blog I would like to talk briefly about PEP498 - Literal String Interpolation in Python. Python supports multiple ways to format text strings (%-formatting, format formatting and Templates). Each of these are useful in some ways but they do lack in other aspects. For eg. the simplest version of format style is too verbose.

    1
    2
    place = “Bilbao, Spain”
    “EuroPython is happening at {place}”.format(place=place)

    Clearly, there is a redundancy. place is being used multiple times. Similarly, % formatting is limited with the types (int, str, double) that can be parsed.

    f-strings are proposed in PEP498. f-strings are basically a literal strings with ‘f’ or ‘F’ as prefix. It embeds expressions using braces that are evaluated at runtime. Let’s see some simple examples:

    1
    2
    3
    4
    5
    6
    7
    place = “Bilbao, Spain”
    print(f“EuroPython is happening at {place}”) # Simple enough, right?

    def say_hello():
    return “Hello”

    print(f‘{say_hello()} there!’)

    I think that’s simpler and better than other string formatting options. If this feature interests you and you want to learn more about it then I recommend checking out the PEP498 documentation.

    Cheers!

    August 06, 2016 08:39 PM

    ghoshbishakh
    (dipy)

    Google Summer of Code Progress August 7

    Yay! We have dynamically generated gallery and tutorials page now!

    Progress so far

    The major changes are in the gallery and in the new tutorials page.

    Instead of showing the manually entered images from the admin panel, the gallery now fetches all images from all the tutorials in the latest documentation.

    This is actually done using by scraping the tutorials page from the json docs.

    Although the docs are now built in json format but still the body is represented as an HTML string. As a result there was no way out other than parsing the HTML. And the best HTML parsing library that I know of is Beautiful Soup.

    def get_doc_examples_images():
        """
        Fetch all images in all examples in latest documentation
    
        """
        doc = DocumentationLink.objects.filter(displayed=True)[0]
        version = doc.version
        path = 'examples_index'
        repo_info = (settings.DOCUMENTATION_REPO_OWNER,
                     settings.DOCUMENTATION_REPO_NAME)
        base_url = "http://%s.github.io/%s/" % repo_info
        url = base_url + version + "/" + path + ".fjson"
        response = requests.get(url)
        if response.status_code == 404:
            url = base_url + version + "/" + path + "/index.fjson"
            response = requests.get(url)
            if response.status_code == 404:
                return []
        url_dir = url
        if url_dir[-1] != "/":
            url_dir += "/"
    
        # parse the content to json
        response_json = response.json()
        bs_doc = BeautifulSoup(response_json['body'], 'html.parser')
        all_links = bs_doc.find_all('a')
    
        examples_list = []
        for link in all_links:
            if(link.get('href').startswith('../examples_built')):
                rel_url = "/".join(link.get('href')[3:].split("/")[:-1])
                example_url = base_url + version + "/" + rel_url + ".fjson"
                example_response = requests.get(example_url)
                example_json = example_response.json()
                example_title = strip_tags(example_json['title'])
    
                # replace relative image links with absolute links
                example_json['body'] = example_json['body'].replace(
                    "src=\"../", "src=\"" + url_dir)
    
                # extract title and all images
                example_bs_doc = BeautifulSoup(example_json['body'], 'html.parser')
                example_dict = {}
                example_dict['title'] = example_title
                example_dict['link'] = '/documentation/' + version + "/" + path + "/" + link.get('href')
                example_dict['description'] = example_bs_doc.p.text
                example_dict['images'] = []
                for tag in list(example_bs_doc.find_all('img')):
                    example_dict['images'].append(str(tag))
                examples_list.append(example_dict)
        return examples_list

    And all the extracted images are displayed in the honeycomb gallery.

    dipy gallery page

    Tutorials Page

    Although each version of documentation has a list of tutorials separately, we wanted a dedicated page which will contain the tutorials with thumbnails and descriptions and they will be grouped into several sections. So similar to the gallery page I parsed the tutorials index page and went into each tutorial and fetched the thumbnails and descriptions. Then this list of tutorials is displayed as an exapandable list of groups.

    def get_examples_list_from_li_tags(base_url, version, path, li_tags):
        """
        Fetch example title, description and images from a list of li tags
        containing links to the examples
        """
    
        examples_list = []
        url_dir = base_url + version + "/" + path + ".fjson/"
    
        for li in li_tags:
            link = li.find("a")
            if(link.get('href').startswith('../examples_built')):
                example_dict = {}
                # get images
                rel_url = "/".join(link.get('href')[3:].split("/")[:-1])
                example_url = base_url + version + "/" + rel_url + ".fjson"
                example_response = requests.get(example_url)
                example_json = example_response.json()
                example_title = strip_tags(example_json['title'])
    
                # replace relative image links with absolute links
                example_json['body'] = example_json['body'].replace(
                    "src=\"../", "src=\"" + url_dir)
    
                # extract title and all images
                example_bs_doc = BeautifulSoup(example_json['body'], 'html.parser')
                example_dict = {}
                example_dict['title'] = example_title
                example_dict['link'] = '/documentation/' + version + "/" + path + "/" + link.get('href')
                example_dict['description'] = example_bs_doc.p.text
                example_dict['images'] = []
                for tag in list(example_bs_doc.find_all('img')):
                    example_dict['images'].append(str(tag))
                examples_list.append(example_dict)
        return examples_list
    
    
    def get_doc_examples():
        """
        Fetch all examples (tutorials) in latest documentation
    
        """
        doc_examples = []
        doc = DocumentationLink.objects.filter(displayed=True)[0]
        version = doc.version
        path = 'examples_index'
        repo_info = (settings.DOCUMENTATION_REPO_OWNER,
                     settings.DOCUMENTATION_REPO_NAME)
        base_url = "http://%s.github.io/%s/" % repo_info
        url = base_url + version + "/" + path + ".fjson"
        response = requests.get(url)
        if response.status_code == 404:
            url = base_url + version + "/" + path + "/index.fjson"
            response = requests.get(url)
            if response.status_code == 404:
                return []
        url_dir = url
        if url_dir[-1] != "/":
            url_dir += "/"
    
        # parse the content to json
        response_json = response.json()
        bs_doc = BeautifulSoup(response_json['body'], 'html.parser')
    
        examples_div = bs_doc.find("div", id="examples")
        all_major_sections = examples_div.find_all("div",
                                                   class_="section",
                                                   recursive=False)
    
        for major_section in all_major_sections:
            major_section_dict = {}
            major_section_title = major_section.find("h2")
            major_section_dict["title"] = str(major_section_title)
            major_section_dict["minor_sections"] = []
            major_section_dict["examples_list"] = []
            all_minor_sections = major_section.find_all("div",
                                                        class_="section",
                                                        recursive=False)
    
            if len(all_minor_sections) == 0:
                # no minor sections, only examples_list
                all_li = major_section.find("ul").find_all("li")
                major_section_dict[
                    "examples_list"] = get_examples_list_from_li_tags(base_url,
                                                                      version,
                                                                      path,
                                                                      all_li)
            else:
                for minor_section in all_minor_sections:
                    minor_section_dict = {}
                    minor_section_title = minor_section.find("h3")
                    minor_section_dict["title"] = str(minor_section_title)
                    minor_section_dict["examples_list"] = []
    
                    all_li = minor_section.find("ul").find_all("li")
                    minor_section_dict[
                        "examples_list"] = get_examples_list_from_li_tags(base_url,
                                                                          version,
                                                                          path,
                                                                          all_li)
                    major_section_dict["minor_sections"].append(minor_section_dict)
            doc_examples.append(major_section_dict)
        return doc_examples

    dipy tutorials page

    What next?

    The github statistics visualizations page is one major task. Another major task is somehow make the automatically generated gallery and tutorials page editable so that we can change the thumbnails or descriptions. Also the coding period is about to end in 2 weeks so documenting the code and merging all pull requests is a priority.


    August 06, 2016 07:30 PM

    Valera Likhosherstov
    (Statsmodels)

    GSoC 2016 #5

    Dynamic Factor Model

    There is one more item in my proposal, which I haven't yet mention in my reports, although I've been working on it before refactoring phase and TVP model implementation. This is a Markov switching dynamic factor model, we will use its following specification:
    y, as usual, is an observation process, and (1) is an observation equation. f is a factor, changing according to (2) - factor transition equation, which is a VAR model with Markov switching intercept. Observation error is a VAR model, too, as (3) states.
    Statsmodels already has a non-switching DFM realization in statespace.dynamic_factor.py file, which has almost the same specification, but without Markov switching intercept term in factor transition equation, so the first challenge was to extend DynamicFactor class and add analogous, but non-switching intercept. This parameter is required, because it is used for switching intercept initialization in Maximum Likelihood Estimation. regime_switching/switching_dynamic_factor.py and regime_switching/tests/test_switching_dynamic_factor.py files contain my experiments with MS-DFM, which were unsuccessful due to reasons, discussed in the next section.

    Irresistible obstacles

    Non-switching DynamicFactor class is a big piece of code itself, and since a lot of its functionality is shared with switching model, the only right solution is to extend it by SwitchingDynamicFactor class. The problem is that this class wasn't supposed to be extended, so it was quite tricky until I realised that it's a bad idea. For example, I have to substitute DynamicFactor's KalmanSmoother instance by an ugly descendant of KimSmoother with some interface changes to achieve compatibility with non-switching model. After a series of similar sophisticated manipulations I came up with a thought that it's impossible to construct a SwitchingDynamicFactor class without changes in the parent class. However, in my experience there are not so many changes needed.
    Another problem is about testing data. I use this Gauss code sample from Kim and Nelson book. This is the only code I know to test MS-DFM against. But the disappointment is that this testing model is incompatible with presented formerly - observation equation contains lagged factor terms, while ours uses only the current factor. I also tried to use some tricks, the main was to group lagged factors into one vector factor. After several errors and considerations, I figured out that this is a bad idea, because a transition noise covariance matrix becomes singular. The only solution I see now is to extend DFM and MS-DFM model, so that they could handle lagged factors in observation equation, but this is a time-consuming challenge.

    What's next?

    The thing I'm working on right now is constructing a generic forecasting to Kim filter, which is the last important feature to be added. I spent a couple of days just thinking how to implement this, but now I'm finally writing the code. Forecasting should be a very visual thing, so I would add it to the existing notebooks, which is also a kind of testing.

    Literature

    [1] "State-space Models With Regime Switching" by Chang-Jin Kim and Charles R. Nelson.

    by Valera Likhosherstov (noreply@blogger.com) at August 06, 2016 09:58 AM

    Aakash Rajpal
    (italian mars society)

    Work Continues!

    Hey All, as the deadline for final submission nears its end , I have been working to finalize my project to fix the few bugs that keep on showing up and annoy me. The Project is about 80% done according to me :p. I am able to render the HUD Dynamically on the Oculus now with my own demo model made in the Blender Game Engine. The Only Work left now is to integrate my HUD with the V-ERAS models for Blender and render the final scene on the Oculus DK2. I have being going through the models available on the repository of IMS V-ERAS to find a model suitable to render the HUD on.

    My College has also started, so for these final days will be very hectic. Hope for the Best


    by aakashgsoc at August 06, 2016 02:49 AM

    Nelson Liu
    (scikit-learn)

    (GSoC Week 10) scikit-learn PR #6954: Adding pre-pruning to decision trees

    The scikit-learn pull request I opened to add impurity-based pre-pruning to DecisionTrees and the classes that use them (e.g. the RandomForest, ExtraTrees, and GradientBoosting ensemble regressors and classifiers) was merged a week ago, so I figure that this would be an appropriate place to talk about what this actually does and provide an example of it in action.

    Decision Tree Node Impurity - A Recap

    Note: if you're familiar with what the "impurity" of a node in a decision tree is, feel free to skip this

    In decision tree-based classification and regression methods, the goal is to iteratively split to minimize the "impurity" of the partitioned dataset (see my week 2 blog post for more details about this). The definition of node impurity varies based on the method used to calculate it, but in rough terms it measures how "pure" a leaf node is. If a leaf node contains samples that all belong to one class (for classification) or have the same real-valued output (for regression), it is "pure" and thus has an impurity of 0.

    The ultimate goal of decision tree-based models is to split the tree such that each leaf node corresponds to the prediction of a single class, even if there is only one sample in that class. However, this can lead to the tree radically overfitting the data; it will grow in a manner such that it will create a leaf node for every sample if necessary. Overfitting is when the decision tree continues to grow and reduce the training set error, but at the expense of teh test set error. In other words, it can basically memorize the samples of the train set, and may lose the ability to generalize well to new datasets. One method for avoiding overfitting in decision trees is pre-pruning.

    In pre-pruning, you stop the decision tree growth before it perfectly fits the training data; this is because (as outlined in the previous paragraph) fitting the training data perfectly often leads to overfitting.

    The scikit-learn tree module, there are a variety of methods used during tree growth to decide whether a node should be split or whether it should be declared a leaf node.

    • Is the current depth of the tree greater than the user-set max_depth parameter? max_depth is the maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. (depth >= max_depth)

    • Or is the number of samples in this current node less than the value of min_samples_split? min_samples_split is the minimum number of samples required to split an internal node. (n_node_samples < min_samples_split)

    • Or is the number of samples in this current node less than 2 * min_samples_leaf? min_samples_leaf is the minimum number of samples required to be in a leaf node. (n_node_samples < 2 * min_samples_leaf)

    • Or is the total weight of all of the samples in the node less than min_weight_leaf? min_weight_leaf defines the minimum weight required to be in a leaf node. (weighted_n_node_samples < min_weight_leaf)

    • Or lastly, is the impurity of the node equal to 0?

    By changing the value of each of these constructor parameters, it's possible to achieve a pseudo-prepruning effect. For example, setting the value of min_samples_leaf can define that each leaf has more than one element, thus ensuring that the tree cannot perfectly (over)fit the training dataset by creating a bunch of small branches exclusively for one sample each. In reality, what this is actually doing is simply just telling the tree that each leaf doesn't HAVE to have an impurity of 0.

    Enter min_impurity_split

    My contribution was to create a new constructor parameter, min_impurity_split; this value defines the minimum impurity that the node must have in order to not be a leaf. For example, if the user-defined value of min_impurity_split is 0.5, a node with an impurity of 0.7 would be further split on but nodes with impurities of 0.5 and 0.2 would be declared leaves and receive no further splitting. In this manner, it's possible to control the grain to which the decision tree fits the data, allowing for coarser fits if desired.

    This is great and all, but does it improve my estimator's performance?

    min_impurity_split helps to control over-fitting and as a result can improve your estimator's performance if it is overfitting on the dataset. I ran a few benchmarks in which I plotted the number of nodes in trees fit with a variety of parameters, and their performance on a held-out test set on the Boston housing prices regression task. The code to generate these plots can be found here.

    Fitted Tree Number of Nodes vs. Parameters

    Prediction Mean Squared Error vs. Paramters

    Note: in the chart on the bottom, the y-axis represents the Mean Squared Error (MSE) -- thus, lower error is better.

    These graphs show some interesting things that I want to point out. If you look at the first chart, you'll see that the tree grown with default parameters is massive, with over 700 nodes. With large trees, it's generally quite easy to overfit since it indicates that the tree is maybe creating leaf nodes for individual samples, which is generally detrimental for generalization. On the other hand, the tree grown with min_impurity_split = 3 is much more modest, with ~150 nodes. If you examine the graph, you'll see that as the value of min_impurity_split decreases, the number of nodes increases which makes sense.

    Looking at the second chart, you will see that setting various values of min_impurity_split and other parameters generally serve to improve the performance of the estimator by reducing MSE. These parameters all limit tree growth in some way. There are some notable exemptions though --- for example, note that the tree grown with max_depth = 3 has a relatively high MSE. This is because, referencing the first chart, it is tiny! As a result, this tree could have been under-fit. Thus, it's important to maintain a balance in tree size. Trees with too many nodes overfit easily, but you don't want to prune so much that you render it incapable of fitting in general! Let's take a look at some plots to see how tuning the value of min_impurity_split affects tree size, training set accuracy, and test set accuracy.

    min_impurity_split_vs_size

    min_impurity_split_vs_train_mse

    min_impurity_split_vs_test_mse

    These images confirm our previous intuitions. As the min_impurity_split increases, it's easy to see that the number of nodes in the tree decreases. Similarly, the training set mean squared error also increases since the tree is no longer memorizing samples. The test set accuracy is more unpredictable, however; as a result, you should always try different values of min_impurity_split and to find the one that works best for your task.

    If you have any questions, comments, or suggestions, you're welcome to leave a comment below.

    Per usual, thanks goes out to my mentors Raghav RV and Jacob Schreiber for taking a look at the PR and reviewing my code.

    You're awesome for reading this! Feel free to follow me on GitHub if you want to track the progress of my Summer of Code project, or subscribe to blog updates via email.

    by Nelson Liu at August 06, 2016 12:30 AM

    August 05, 2016

    tsirif
    (Theano)

    Multi-GPU/Node interface in Platoon

    Last weeks I was working on a new interface in Platoon which will support collective operations for Theano’s GPU shared variables over multiple GPUs over multiple hosts. This will enable Platoon to train your Theano models using multiple GPUs even if they do not reside in the same host.

    Usage

    In order to use it, a worker file needs to be provided. A worker file defines the training process of a single set of model parameters in a parallel and distributed manner. Optionally and in case you want to extend the distributed computation capabilities of the training process, you are encouraged to provide a controller file which extends the default one (platoon.controller module) in this framework. User must invoke the platoon2-launcher script in order to start training with the new interface.

    Platoon is configured through the command-line arguments of this launcher and in case of their absence (or if it needed) through environmental variables or Platoon configuration files. Please read platoonrc.conf in package’s root directory to learn about every way that Platoon can be configured.

    If single-node is explicitly specified through command-line arguments, the specified devices will be used in the GPU communicator world in the order they are parsed. The same thing applies also for lists of devices found in platoon environmentals or configuration files.

    e.g. usage:

    • platoon2-launcher lstm -D cuda0 cuda3 (explicit config)
    • platoon2-launcher lstm (config with envs/files - may be multi-node)

    If multi-node is explicitly specified through command-line arguments, extra configuration through appropriate environmentals per host or files needs to be done in order to describe which devices will be used in each host. Host names are given the same way they are given in MPI’s mpiexec.

    e.g. usage:

    • platoon2-launcher lstm -nw 2 2 -H lisa0 lisa1 (2 gpus on lisa0 and 2 gpus on lisa1)

    Please notice that this launcher is used to set up the new worker interface (the old is still usable - but not in multi-node configs). The new worker interface supports only CUDA devices currently. NVIDIA’s NCCL collectives library and pygpu are required for multi-GPU, while mpi4py is required in addition for multi-node.

    API description and how it works

    I will now describe how the new API works and its usage in training code.

    Platoon uses a controller/worker architecture in order to organize multiple hosts which own multiple GPUs. A controller process is spawned in each host, which is responsible for organizing its worker processes and communicating with controller processes in other hosts for computing. In addition, there are as many worker processes in each host as there are devices which participate in the computation procedure. Each worker process is responsible for a single computation device. By this I mean that a worker process will contain Theano code which act on a single device and will use a Worker instance in order to exploit multi-GPU/node computation.

    platoon architecture

    By default, someone who wishes to write code for training a model with Platoon must write the code which will run for worker processes. Theano functions are to be created as usual which will be executed on a single Theano device. This device is configured for the worker process by the THEANO_FLAGS="device=<...>" environmental variable which is set by the launching procedure. Among single GPU computation there will be multi-GPU/node computations which are caused by calls to Platoon’s interface. While developing training code, the user must create the corresponding Theano GPU shared variables which will be used as arguments to Platoon’s new interface.

    import os
    from platoon.worker import Worker
    import theano
    import numpy as np
    
    # instantiate a worker
    worker = Worker(control_port=5567)
    # how many workers are there across all hosts
    total_nw = int(os.environ['PLATOON_TEST_WORKERS_NUM'])
    
    # make Theano shared variables for input and output
    inp = np.arange(32, dtype='float64')
    sinp = theano.shared(inp)
    out = np.empty_like(inp)
    sout = theano.shared(out)
    
    # execute interface
    worker.all_reduce(sinp, '+', sout)
    
    expected = total_nw * inp
    actual = sout.get_value()
    assert np.allclose(expected, actual)
    

    Minimal example code for a worker process

    When a call to worker.all_reduce is made, the internal pygpu.gpuarray.GpuArrays are fetched and used as arguments to the corresponding AllReduce collective operation in a local pygpu GPU communicator world. This GPU comm world is local in a sense that it is composed only of a single host’s GPUs, in order to effectively utilize NVIDIA’s NCCL optimized framework. So we are expecting to have concurrent NCCL operations for each host. When the pygpu collective has finished and we are having a multi-node training procedure, a single worker out of the workers in each host will copy the result from its GPU to a memory buffer in the host. This memory buffer is shared (through the means of posix ipc) among all workers processes in a host and their controller process. Then this worker requests from its controller to execute the corresponding MPI collective operation with the other controller processes in a inter-node MPI communicator world. The result from this operation is received in the same shared buffer. When the MPI operation has finished, all workers write back concurrently the result from the shared buffer to the destination GpuArray in their GPUs.

    # Execute collective operation in local NCCL communicator world
    res = self._regional_comm.all_reduce(src, op, dest)
    
    # Create new shared buffer which corresponds to result GpuArray buffer
    if dest is None:
        self.new_linked_shared(res)
    else:
        if dest not in self.shared_arrays:
            self.new_linked_shared(dest)
        res = dest
    res_array = self.shared_arrays[res]
    
    self.lock()
    first = self.send_req("platoon-am_i_first")
    if first:
        # Copy from GpuArray to shared memory buffer
        internal_res.read(res_array)
    
        # Request from controller to perform the same collective operation
        # in MPI communicator world using shared memory buffer
        self.send_req("platoon-all_reduce", info={'shmem': self._shmem_names[res],
                                                  'dtype': str(internal_res.dtype),
                                                  'op': op})
    self.unlock()
    
    # Concurrently copy from shared memory back to result GpuArray
    # after Controller has finished global collective operation
    internal_res.write(res_array)
    
    if dest is None:
        return res
    

    Simplified code from Worker class demonstrating program flow

    Right now, I am testing thoroughly this new interface. I am interested to see the behavior of the system if an unexpected error occurs. I expect processes to shut down as cleanly as possible. For the next steps, I would like to include modules in Platoon which will allow creating a training and validating procedure with ease through ready-to-use configurable classes of training parts. This way Platoon will also provide a high-level gallery of reusable training algorithms for multi-GPU/node systems.

    Till then, keep on coding
    Tsirif

    by Christos Tsirigotis (tsirif@gmail.com) at August 05, 2016 11:56 PM

    Adhityaa Chandrasekar
    (coala)

    My EuroPython Experience

    EuroPython 2016

    What a blast! I had a lot of fun at EuroPython, and it wasn't just the conference.

    To start off, it was exciting to meet the guys: Lasse, Max, Tushar, Udayan, Adrian, Alex and Justus. My only interaction with them was through Gitter previously. We had a lot of fun (more on that later): everyday after the conference, we all go over to the Airbnb and do our own sprints which I enjoyed from start to finish.

    And the conference itself was one of the best experiences I've ever had: I learned so much about Python: iterables, meta-classes, performance optimizations, parallel computing and much, much more.

    But my favorites were the Lightning Talks. A Lightning Talk is an hour long event where several speakers get five minutes on stage to talk about virtually anything they want. Lasse got two opportunities on stage and Max gave a talk as well. And then on the last day, the whole team got to present a video, which we made the night before. It is one of the most hilarious things I've seen :D

    I also had the opportunity to co-conduct a 3-hour workshop on making a contribution to open source with Tushar. It was an interesting experience and I never fully understood the amount of effort that needs to go into a talk/workshop till then.

    And on the last two days, we had sprints. I just juggled with several small issues and PRs (and of course, my GSoC project). It was different talking in person with everybody instead of Gitter (although we did use Gitter when the person was over 3 feet away :P). And we got a lot of stuff done (and I got a bar of chocolate from Lasse!).

    Anyway, that was my EuroPython experience. I went on a tour to France, Belgium and Poland after that for a week: Europe is truly beautiful. Hope I can make it next year as well :)

    Adhityaa
    :wq

    August 05, 2016 06:30 PM

    Prayash Mohapatra
    (Tryton)

    Nearing Completion

    Well yes, I recently made my (hopefully) final commit to the repo. With everything going as per the plan, I am really happy to strike out the items to be done in my workflowy list.

    Last two weeks’ work enables one to select predefined exports to be navigated and selected from mouse and key presses. CSV is being properly generated for the selected records. Submitted a PR adding custom quote character support for PapaParse’s unparse method.

    Removed the open/save option from the export dialog as the generated export file is being sent to the browser to handle.

    August 05, 2016 05:56 PM

    liscju
    (Mercurial)

    Coding Period IX - X Week

    In the last two week i have done some minor style fixes, but also added support for connecting to redirection server through https. It was not hard because in mercurial opening connection via urlopener.url supports https protocol. I also added https support in example server using ssl.wrap_socket on BaseHttpServer.HttpServer sockets. There is one problem with this solution. Client connect to redirection server by themselves, so they need to have certificate of redirection server to trust its certificate or to trust redirection server by default. Probably the best solution would be if main repository server propagate authorization information to clients but this needs to be carefully designed because solution supports few procotols.

    The second thing i have done was i added command for converting large file repository to the one that uses redirection feature. This was not hard either because mercurial already has module for converting repositories revision after revision, my goal was to detect which files in revisions are large files, send them to redirection server and make sure they are not put in local store or cache.

    by Piotr Listkiewicz (noreply@blogger.com) at August 05, 2016 12:50 PM

    Abhay Raizada
    (coala)

    Breaking Lines

    As the GsoC period comes to an end i have only two major tasks left to do, one being making a LineBreakBear, which suggests line breaks, when the user has Lines of code that are more than a max_line_length setting, and the Other is adding Indents based on Keywords in the IndentationBear. 

    This week i was able to device a simple algorithm to suggest line breaks. If you’ve followed my blogs you’d know that there’s something I like to call an encapsulator😛, it’s a fancy name for different but not all types of brackets. So the algorithm is as follows:

    1. Get all occurences of lines which exceed max_line_length
    2. Check wether these lines have an encapsulator which starts before the limit
    3. Find the last encapsulator started in this line before the limit.
    4. Suggest a line break at that point with the new line being indented in accordance with the indentation_width setting.

    Now this algorithm is really simple and does not consider border cases such as hanging-indents.

    Hopefully by the next blog posts i’d have completed my Project. I’ll have lots to share about my experience this year:)


    by abhsag at August 05, 2016 12:11 PM

    aleks_
    (Statsmodels)

    if remaining_time < 20 * seconds_per_day: get_coding()

    2 and a half weeks left - and quite a few tasks too. However, I am confident that I will do a good job in this remaining time, also because last week showed that some ticks on the todo-list may be achieved quickly. I hope that I can tick the next item (Granger-causality) this week. The item following it (impulse-response analysis) will be more straightforward as it will be mainly about reusing existing code from the VAR-framework.
    With that, let's get coding! : )

    by yogabonito (noreply@blogger.com) at August 05, 2016 08:04 AM

    Utkarsh
    (pgmpy)

    Introduction to Markov Chains

    Markov Chains are integral component of Markov Chain Monte Carlo (MCMC) techniques. Under MCMC Markov Chain is used to sample from some target distribution. This post tries to develop basic intuition about what Markov Chain is and how we can use it to sample from a distribution.

    In a layman terms we can define Markov Chain as a collection of random variables having the property that, given the present, the future is conditionally independent of the past. This might may not make sense to you right now but this will be the core of the discussion when we discuss about MCMC algorithms.

    Lets us now take a formal (mathematical) look at the definition of Markov Chain and some of its properties. A Markov Chain is a stochastic process that undergoes transition from one state to another on a given set of states called state space of Markov Chain.

    I used a term stochastic process which is a random process that evolves with time. We can perceive it as probabilistic counterpart of a deterministic process where instead of evolving in a one way (deterministic) process can have multiple directions in which it can evolve or it has some kind of indeterminacy to its future. One example of a stochastic process is Brownian Motion.

    A Markov is characterised by following three elements:

    • A state space , which is a set of values (state ) chain is allowed to take.

    • A transition model , which specifies for each pair of state , the probability of going from to .

    • An initial state distribution , , which defines the probability of being in any one of the possible states at the initial iteration t = 0.

    We can define distribution over subsequent time , , , using chain dynamics as

    =

    I earlier described a porperty of Markov chain which was

    Given the present, the future is conditionally independent of the past

    This property is called as Markov Property or memoryless property of Markov chain, which is mathematically described as:

    There are other two properties of interest which we can usually find in most of the real life application of Markov Chains:

    • Stationarity : Let sequence of some random elements of some set be a stochastic process, then a stochastic process is stationary if for every positive integer k the distribution of the k-tuple does not depend on ‘n’. Thus a Markov Chain is stationary if it is stationary stochastic process. This stationarity property in Markov Chains implies stationary transition probabilities which in turn gives rise to equilibrium distribution. It is not necessary that all Markov Chains have equilibrium distribution but all Markov Chains used in MCMC do.

    • Reversibility: A Markov Chain is reversible if the probability of transition is same as the probability of reverse transition . Reversibility in Markov Chain implies stationarity.

    Finite State Space Markov Chain

    If the state space of Markov Chain takes on a finite number of distinct values, the transition operator can be defined using a square matrix

    The entry represents transition probability of moving from state to state .

    Lets first use an example Markov chain and understand these terms using that. I’ll use a Markov chain to simulate Gambler’s Ruin problem. In this problem suppose that there are two players and playing poker. Initially both of them had with them. In each round winner gets a dollar and loser loses one and game will continue till any one of them loses his all money. Consider that probability of winning for is . Our task is to estimate probability of winning the complete game for player . Here is how our Markov chain will look like: Gambler's Ruin Chain

    The state space of Markov Chain is As state space is finite, we can write the transition model in form of a matrix as

    transition = [[1, 0, 0, 0, 0],
                  [0.51, 0, 0.49, 0, 0],
                  [0, 0.51, 0, 0.49, 0],
                  [0, 0, 0.51, 0, 0.49],
                  [0, 0, 0, 0, 1]]
    

    The initial money with is , so we can consider start state as vector start = [0, 0, 1, 0, 0]. Now with these characterisation we will simulate our Markov Chain and try to reach stationary distribution, which will give us probability of winning.

    import numpy as np
    import matplotlib.pyplot as plt
    iterations = 30  # Simulate chain for 30 iterations
    initial_state = np.array([[0, 0, 1, 0, 0]])
    transition_model = np.array([[1, 0, 0, 0, 0], [0.51, 0, 0.49, 0, 0], [0, 0.51, 0, 0.49, 0],
                                 [0, 0, 0.51, 0, 0.49], [0, 0, 0, 0, 1]])
    transitions = np.zeros((iterations, 5))
    transitions[0] = initial_state
    for i in range(1, iterations):
        transitions[i] = np.dot(transitions[i-1], transition_model)
    labels = [0, 0, 0, 0, 0, 0]
    plt.figure()
    plt.hold(True)
    plt.plot(transitions)
    labels[0], = plt.plot(range(iterations), transitions[:,0], color='r')
    labels[1], = plt.plot(range(iterations), transitions[:,1], color='b')
    labels[2], = plt.plot(range(iterations), transitions[:,2], color='g')
    labels[3], = plt.plot(range(iterations), transitions[:,3], color='m')
    labels[4], = plt.plot(range(iterations), transitions[:,4], color='c')
    labels[5], = plt.plot([20, 20], [0, 1.2], color='k', linestyle='dashed')
    plt.legend(labels, ['money=0','money=1','money=2','money=3', 'money=4', 'burn-in'])
    plt.hold(False)
    #plt.show()
    print("Probability of winning the complete game for P1 is", transitions[iterations - 1][4])
    

    The output of above code sample is: Probability of winning the complete game for P1 is 0.479978863078, which is a good approximation of original result 0.48(see the link, for calculation of exact result). Gambler_chain_trace In Trace plot of Markov chain one can see that in starting there were fluctuations but after some-time chain reached an equilibrium/stationary distribution as probabilities are not changing much in subsequent iterations. Mathematically a distribution is a stationary distribution if it satisfies following property:

    Using the above property we can see that our chain has approximately reached stationary distribution as following condition returns True.

    np.allclose(transitions[-1], np.dot(transitions[-1], transition_model), atol=1e-04)
    

    The initial period of about 20 iterations(here) is called burn-in period of Markov Chain( see the dotted line in plot ) and is defined as the number of iterations it takes the chain to move from initial conditions to stationary distribution. I find Burn-in period to be a misleading term so I’ll call it Warm-up period. The Burn-in term was used by early authors of MCMC who were from physics background and has been used since then :/ .

    One interesting thing about stationary Markov chains is that it is not necessary to sequentially iterate to predict future state. One can predict future state by raising the transition operator to the N-th power, where N is the iteration a which we want to predict, and then multiplying it by the initial distribution. For example if we wanted to predict probabilities after 24 iteration we could simply have done:

    Lets look at a more interesting application of stationary Markov chain. Here we will create our own naive page ranking algorithm using a Markov Chain. For computing transition probabilities from page to (for all pairs of , ) we use a configuration parameter and two factors which are dependent on the number of pages that links to and whether the page has link to page . Here is a python code for the same:

    import matplotlib.pyplot as plt
    import numpy as np
    
    alpha = 0.77  # Configuration parameter
    iterations = 20
    num_world_wide_web_pages = 4.0
    # Consider world wide web has 4 web pages only
    # Following is mapping between number of links to page
    links_to_page = {0: 3, 1: 3, 2: 1, 3: 2}
    
    # Returns transition probability of x -> y
    def get_transition_probabilities(links_to_page, linked):
        global alpha
        global num_world_wide_web_pages
        constant_val = (1.0 - alpha)/num_world_wide_web_pages
        if linked is True:
            return (alpha/links_to_page) + constant_val
        else:
            return constant_val
    
    transition_probs = np.zeros((4,4))
    # Page 1 is not linked to itself
    transition_probs[0][0] = get_transition_probabilities(links_to_page[0], False)
    # Page 1 is linked to every other page
    for i in range(1,4):
        transition_probs[0][i] = get_transition_probabilities(links_to_page[0], True)
    # Page 2 is not linked to itself
    transition_probs[1][1] = get_transition_probabilities(links_to_page[1], False)
    # Page 2 is linked to every other page
    for i in [0, 2, 3]:
        transition_probs[1][i] = get_transition_probabilities(links_to_page[1], True)
    # Page 3 is only linked to page 4
    transition_probs[2][3] = get_transition_probabilities(links_to_page[2], True)
    # Page 3 is not linked to every other page except 4
    for i in range(3):
        transition_probs[2][i] = get_transition_probabilities(links_to_page[2], False)
    # Page 4 is linked to 1 and 3 and is not linked to 2 and itself
    for i in range(4):
        transition_probs[3][i] = get_transition_probabilities(links_to_page[3], not i%2)
    
    transitions = np.zeros((iterations, 4))
    transitions[0] = np.array([1, 0, 0, 0])  # Starting markov chain from page 1, initial distribution
    
    for i in range(1, iterations):
        transitions[i] = np.dot(transitions[i-1], transition_probs)
    
    labels = [0, 0, 0, 0, 0]
    plt.figure()
    plt.hold(True)
    labels[0], = plt.plot(range(iterations), transitions[:,0], color='b')
    labels[1], = plt.plot(range(iterations), transitions[:,1], color='r')
    labels[2], = plt.plot(range(iterations), transitions[:,2], color='g')
    labels[3], = plt.plot(range(iterations), transitions[:,3], color='k')
    labels[4], = plt.plot([10, 10], [0, 1], color='y', linestyle='dashed')
    plt.legend(labels, ['page 1', 'page 2', 'page 3', 'page 4', 'burn-in'])
    plt.hold(False)
    plt.show()
    

    page_rank_trace

    Our algorithm will rank pages in order Page 4, Page 3, Page 1, Page 2 :o .

    Continuous State-Space Markov Chains

    A Markov chain can also have continuous state space that exist in real numbers . In this we cannot represent transition operator as a matrix , but instead we represent it as a continuous function on the real numbers. Like Finite state-space Markov chains continuous state-space Markov chains also have a warm-up period and a stationary distribution but here stationary distribution is also over continuous set of variables.

    Lets look at example on how to use a continuous state space Markov chain to sample from continuous distribution. Here our transition operator will be normal distribution with mean as half of the distance between zero and previous state and unit variance. We will throw away certain amount of states generated in start as they will be in warm-up period , the subsequent states that our chain reaches in stationary distribution will be our samples. Also we can run multiple chains simultaneously to draw samples more densely.

    import numpy as np
    import matplotlib.pyplot as plt
    
    np.random.seed(71717)
    warm_up = 100
    n_chains = 3
    
    transition_function = lambda x, n_chains: np.random.normal(0.5*x, 1, n_chains)
    n_iterations = 1000
    x = np.zeros((n_iterations, n_chains))
    x[0] = np.random.randn(n_chains)
    
    for it in range(1, n_iterations):
        x[it] = transition_function(x[it-1], n_chains)
    
    plt.figure()
    plt.subplot(222)
    plt.plot(x[0:200])
    plt.hold(True)
    minn = min(x.flatten())
    maxx = max(x.flatten())
    l = plt.plot([warm_up, warm_up],[minn, maxx], color='k', lw=3)
    plt.legend(l, ['Warm-up'])
    plt.title('Trace plot of first 200 samples')
    plt.hold(False)
    plt.subplot(224)
    plt.plot(x)
    plt.hold(True)
    l = plt.plot([warm_up, warm_up],[minn, maxx], color='k', lw=3)
    plt.legend(l, ['Warm-up'], loc='lower right')
    plt.title("Trace plot of entire chain")
    plt.hold(False)
    samples = x[warm_up+1:,:].flatten()
    plt.subplot(121)
    plt.hist(samples, 100)
    plt.legend(["Markov chain samples"])
    mu = round(np.mean(samples), 2)
    var = round(np.var(samples), 2)
    plt.title("mean={}, variance={}".format(mu, var))
    plt.show()
    

    trace_plot_continuous

    Ending Note

    In the above examples we deduced the stationary distribution based on observation and gut feeling :P . However, in order to use Markov chains to sample from a specific target distribution, we have to design the transition operator such that the resulting chain reaches a stationary distribution that matches the target distribution. This is where MCMC methods come to rescue.

    August 05, 2016 12:00 AM

    jbm950
    (PyDy)

    GSoC Week 12

    This week my main work was on Featherstone’s articulated body algorithm. I started by prototyping what I thought his algorith might look like in python code (the algorithm was pulled from chapter 7 of his book). With the passes prototyped it was apparent that I would need a full description of the kinematic tree and so I prototyped the building of the kinematic tree from a single “base” body. I then went on to see what it would look like if the kinematic tree was built during the first pass of his articulated body algorithm and decided that keeping the two separate would result in cleaner code.

    With the three passes prototyped and the kinematic tree built I started digging into Featherstone’s book to better determine the definition of each of the variables in the algorithm. While doing this I ended up reading a second source where Featherstone describes the articulated body algorithm and it was helpful in furthering my understanding of the algorithm as it was a condensed summary. I then compared the written version of the algorith in his book and this article with the two matlab versions he has posted online and the python version her provides a link for online. This helped me see where some terms he includes in his book he doesn’t include in his code. It also helped me to see what code for the algorithm might look like.

    After working on the mock up of the passes and trying to better understand them, I switched focus to the joint code that needs to be finished so that it can be used in my implementation of the articulated body algorithm. This has lead to some confusion about the design decisions that were made in the past when putting together the joint code and this is the current stage I am sitting at as I await feedback on some of my questions.

    This week I also looked over a couple of documentation PR’s. One was a simple matter of fixing some indentation and seems mostly ready to merge but the second turned some docstrings into raw strings so they could add latex math code. I don’t know what the general stance is on the latter but I’m of the opinion that the docstrings should be human readable since people may actually look through the code for them or hope that help(function) provides something useful. In this case the latex math code is cluttered and would be better off in .rst files where people are only going to be reading the rendered version. On that PR I am awaiting response from someone with sympy to see if this is indeed prefered.

    Future Directions

    Hopefully I’ll recieve some feedback about the joints and Featherstone’s method so I can keep moving forward with these. In the mean time there are a few other bits of code I will need to complete that the algorithm uses that is not directly related to my questions. If I finish these tasks before recieving feedback I will move forward with changing the joint code as I think would be best.

    PR’s and Issues

    • (Open) [WIP] Added system.py to physics/mechanics PR #11431
    • (Open) Intendation fixes – sympy/concrete/summations.py PR #11473
    • (Open) Adjustments to Legendre, Jacobi symbols docstrings PR #11474
    • (Open) [WIP] FeatherstonesMethod PR #11415

    August 05, 2016 12:00 AM

    August 04, 2016

    Raffael_T
    (PyPy)

    Only bug fixes left!


    All changes of cpython have been implemented in PyPy, so all that's left to do now is fixing some bugs. Some minor changes had to be done, because not everything of cpython has to be implemented in PyPy. For example, in cpython slots are used to check the existence of a function and then call it. The type of an object has the information of valid functions, stored as elements inside structs. Here's an example of how the __await__ function get's called in cpython:
    ot = Py_TYPE(o);
    if (ot->tp_as_async != NULL) {
    getter = ot→tp_as_async→am_await;
    }
    if (getter != NULL) {
    PyObject *res = (*getter)(o);
    if (res != NULL) { … }
    return res;
    }
    PyErr_Format(PyExc_TypeError,
    "object %.100s can't be used in 'await' expression",
    ot→tp_name);
    return NULL;

    This 'getter' directs to the am_await slot in typeobject.c. There, a lookup is done having '__await__' as parameter. If it exists, it gets called, an error is raised otherwise.
    In PyPy all of this is way more simple. Practically I just replace the getter with the lookup for __await__. All I want to do is call the method __await__ if it exists, so that's all to it. My code now looks like this:
    w_await = space.lookup(self, “__await__”)
    if w_await is None: …
    res = space.get_and_call_function(w_await, self)
    if is not None: …
    return res


    I also fixed the _set_sentinel problem I wrote about in the last post. All dependency problems of other (not yet covered) Python features have been fixed. I can already execute simple programs, but as soon as it gets a little more complex and uses certain asyncio features, I get an error about the GIL (global interpreter lock):
    “Fatal RPython error: a thread is trying to wait for the GIL, but the GIL was not initialized”
    First I have to read some descriptions of the GIL, because I am not sure where this problem could come up in my case.
    There are also many minor bugs at the moment. I already fixed one of the bigger ones which didn't allow async or await to be the name of variables. I also just learned that something I implemented does not work with RPython which I wasn't aware of. My mentor is helping me out with that.
    I also have to write more tests, because they are the safest and fastest way to check for errors. There are a few things I didn't test enough, so I need to catch up on writing tests a bit.


    Things are not going forward as fast as I would love it to, because I often get to completely new things which I need to study first (like the GIL in this case, or the memoryview objects from the last blog entry). But there really shouldn't be much left to do now until everything works, so I am pretty optimistic with the time I have left. If I strive to complete this task soon, I am positive my proposal will be successful.

    by Raffael Tfirst (noreply@blogger.com) at August 04, 2016 07:36 PM

    Ranveer Aggarwal
    (dipy)

    Working on a 2D Panel

    Last week, we built a basic 3D orbital menu with most of the existing 2D elements successfully ported to 3D. Sliding in 3D is still a bit of a problem, and we are exploring ways to do it. For the time being 3D sliding has been pushed back to later.

    The next component of the project is building a panel. A panel is a collection of (for now) 2D elements. It is synonymous to a window.

    Implementation

    We used a 2D rectangle to give the panel a background. The panel has an add_element function that takes in a 2D UI element and its relative position as arguments. If the size of the panel is 200x200 pixels and if I specify the relative position of the 2D UI element as (0.4, 0.4), then the position of the element inside the panel (with the panel’s lower left corner as the origin) will be (0.4*200, 0.4*200). Applying appropriate transformations will get the element to the position where we want it to be.

    The Result

    Here’s how the panel currently looks like:

    The 2D Panel

    What Next

    Next, certain issues with the panel are to be fixed and certain enhancements are to be done, for example, we want to click and drag the panel around. Also, we want the panel to be left aligned or right aligned.

    August 04, 2016 01:15 PM

    srivatsan_r
    (MyHDL)

    Nearing completion!

    The RISC-V project is almost over, just some tests have to be completed. This week I was doing the controller module that generates control signals after decoding the instructions. I then made the cult_div module.

    It was easy to make these modules as I had the reference code in Verilog for the V-scale processor.

    While writing the test for must_div module I faced some problems as I was not getting the expected output. I couldn’t figure out why that happened. So, now my project partner is debugging the code.

    I m now currently writing tests for controller module and trying to figure out how exactly the controller works by giving some inputs and checking the output.

    Hopefully we will complete this project within a week.


    by rsrivatsan at August 04, 2016 06:26 AM

    Anish Shah
    (Core Python)

    GSoC'16: Week 9 and 10

    GSoC

    Just a few weeks left for GSoC to end. I am working on completing my patches and creating a blog post for final submission

    Reviews

    I submitted 6 patches as a part of my GSoC work.

    1. Add a GitHub PR field on issue’s page

      Status: Complete

      I cleaned up the logic and the patch is much simpler than the one I submitted first!

    2. Link GitHub PR in issue comments

      Status: Complete

    3. GitHub webhooks

      Status: Almost Complete

      I updated the patch according to the reviews suggested by my mentor. The initial idea is completed. I added just one new thing that still needs to be reviewed - if issue is not referenced in PR title/body, then a new issue is created on b.p.o.

    4. PR status on issue’s page

      Status: Almost Complete

      I have updated the patch. It just needs one final review. :)

    5. PR comments on bpo

      Status: In Progress I need some inputs from PSF community on how many GitHub comments we need to add on bpo. You can find the e-mail thread here

    6. Convert patches to PR

      Status: In Progress

    Work Product Submission

    For my final submission, I’m creating one big blogpost with all the documents of the work that I have done till now. :) It’s still a work in progress. You can find it here.

    Thank you for reading this blogpost. This is it for today. See you again. :)

    August 04, 2016 12:00 AM

    August 03, 2016

    sahmed95
    (dipy)

    IVIM documentation

    Intravoxel incoherent motion

    The intravoxel incoherent motion (IVIM) model describes diffusion and perfusion in the signal acquired with a diffusion MRI sequence that contains multiple low b-values. The IVIM model can be understood as an adaptation of the work of Stejskal and Tanner [Stejskal65]in biological tissue, and was proposed by Le Bihan [LeBihan84]. The model assumes two compartments: a slow moving compartment, where particles diffuse in a Brownian fashion as a consequence of thermal energy, and a fast moving compartment (the vascular compartment), where blood moves as a consequence of a pressure gradient. In the first compartment, the diffusion coefficient is \(\mathbf{D}\) while in the second compartment, a pseudo diffusion term \(\mathbf{D^*}\) is introduced that describes the displacement of the blood elements in an assumed randomly laid out vascular network, at the macroscopic level. According to [LeBihan84], \(\mathbf{D^*}\) is greater than \(\mathbf{D}\).
    The IVIM model expresses the MRI signal as follows:
    \[S(b)=S_0(fe^{-bD^*}+(1-f)e^{-bD})\]
    where \(\mathbf{b}\) is the diffusion gradient weighing value (which is dependent on the measurement parameters), \(\mathbf{S_{0}}\) is the signal in the absence of diffusion gradient sensitization, \(\mathbf{f}\) is the perfusion fraction, \(\mathbf{D}\) is the diffusion coefficient and \(\mathbf{D^*}\) is the pseudo-diffusion constant, due to vascular contributions.
    In the following example we show how to fit the IVIM model on a diffusion-weighteddataset and visualize the diffusion and pseudo diffusion coefficients. First, we import all relevant modules:
    import matplotlib.pyplot as plt

    from dipy.reconst.ivim import IvimModel
    from dipy.data.fetcher import read_ivim
    We get an IVIM dataset using Dipy’s data fetcher read_ivim. This dataset was acquired with 21 b-values in 3 different directions. Volumes corresponding to different directions were registered to each other, and averaged across directions. Thus, this dataset has 4 dimensions, with the length of the last dimension corresponding to the number of b-values. In order to use this model the data should contain signals measured at 0 bvalue.
    img, gtab = read_ivim()
    The variable img contains a nibabel NIfTI image object (with the data) and gtab contains a GradientTable object (information about the gradients e.g. b-values and b-vectors). We get the data from img using read_data.
    data = img.get_data()
    print('data.shape (%d, %d, %d, %d)' % data.shape)
    The data has 54 slices, with 256-by-256 voxels in each slice. The fourth dimension corresponds to the b-values in the gtab. Let us visualize the data by taking a slice midway(z=27) at \(\mathbf{b} = 0\).
    z = 27
    b = 20

    plt.imshow(data[:, :, z, b].T, origin='lower', cmap='gray',
    interpolation='nearest')
    plt.axhline(y=100)
    plt.axvline(x=170)
    plt.savefig("ivim_data_slice.png")
    plt.close()
    Heat map of a slice of data
    The region around the intersection of the cross-hairs in the figure contains cerebral spinal fluid (CSF), so it so it should have a very high \(\mathbf{f}\) and \(\mathbf{D^*}\), the area between the right and left is white matter so that should be lower, and the region on the right is gray matter and CSF. That should give us some contrast to see the values varying across the regions.
    x1, x2 = 160, 180
    y1, y2 = 90, 110
    data_slice = data[x1:x2, y1:y2, z, :]

    plt.imshow(data[x1:x2, y1:y2, z, b].T, origin='lower',
    cmap="gray", interpolation='nearest')
    plt.savefig("CSF_slice.png")
    plt.close()
    Heat map of the CSF slice selected.
    Now that we have prepared the datasets we can go forward with the ivim fit. Instead of fitting the entire volume, we focus on a small section of the slice as selected aboove, to fit the IVIM model. First, we instantiate the Ivim model. Using a two-stage approach: first, a tensor is fit to the data, and then initial guesses for the parameters \(\mathbf{S_{0}}\) and \(\mathbf{D}\) obtained from this this tensor by _estimate_S0_D is used as the starting point for the non-linear fit of IVIM parameters using Scipy’s leastsq or least_square function depending on which Scipy version you are using. All initializations for the model such as split_b are passed while creating the IvimModel. If you are using Scipy 0.17, you can also set bounds by setting bounds=([0., 0., 0., 0.], [np.inf, 1., 1., 1.])) while initializing the IvimModel. It is recommeded that you upgrade to Scipy 0.17 since the fitting results might at times return values which do not make sense physically. (For example a negative \(\mathbf{f}\))
    ivimmodel = IvimModel(gtab)
    To fit the model, call the fit method and pass the data for fitting.
    ivimfit = ivimmodel.fit(data_slice)
    The fit method creates a IvimFit object which contains the parameters of the model obtained after fitting. These are accessible through the model_params attribute of the IvimFit object. The parameters are arranged as a 4D array, corresponding to the spatial dimensions of the data, and the last dimension (of length 4) corresponding to the model parameters according to the following order : \(\mathbf{S_{0}, f, D^*, D}\).
    ivimparams = ivimfit.model_params
    print("ivimparams.shape : {}".format(ivimparams.shape))
    As we see, we have a 20x20 slice at the height z = 27. Thus we have 400 voxels. We will now plot the parameters obtained from the fit for a voxel and also various maps for the entire slice. This will give us an idea about the diffusion and perfusion in that section. Let(i, j) denote the coordinate of the voxel. We have already fixed the z component as 27 and hence we will get a slice which is 27 units above.
    i, j = 10, 10
    estimated_params = ivimfit.model_params[i, j, :]
    print(estimated_params)
    Next, we plot the results relative to the model fit. For this we will use the predict method of the IvimFit object to get the estimated signal.
    estimated_signal = ivimfit.predict(gtab)[i, j, :]

    plt.scatter(gtab.bvals, data_slice[i, j, :],
    color="green", label="Actual signal")
    plt.plot(gtab.bvals, estimated_signal, color="red", label="Estimated Signal")
    plt.xlabel("bvalues")
    plt.ylabel("Signals")

    S0_est, f_est, D_star_est, D_est = estimated_params
    text_fit = """Estimated \n S0={:06.3f} f={:06.4f}\n
    D*={:06.5f} D={:06.5f}""".format(S0_est, f_est, D_star_est, D_est)

    plt.text(0.65, 0.50, text_fit, horizontalalignment='center',
    verticalalignment='center', transform=plt.gca().transAxes)
    plt.legend(loc='upper right')
    plt.savefig("ivim_voxel_plot.png")
    plt.close()
    Plot of the signal from one voxel.
    Now we can map the perfusion and diffusion maps for the slice. We will plot a heatmap showing the values using a colormap. It will be useful to define a plotting function for the heatmap here since we will use it to plot for all the IVIM parameters. We will need to specify the lower and upper limits for our data. For example, the perfusion fractions should be in the range (0,1). Similarly, the diffusion and pseudo-diffusion constants are much smaller than 1. We pass an argument called variable to out plotting function which gives the label for the plot.
    def plot_map(raw_data, variable, limits, filename):
    lower, upper = limits
    plt.title('Map for {}'.format(variable))
    plt.imshow(raw_data.T, origin='lower', clim=(lower, upper),
    cmap="gray", interpolation='nearest')
    plt.colorbar()
    plt.savefig(filename)
    plt.close()
    Let us get the various plots so that we can visualize them in one page
    plot_map(ivimfit.S0_predicted, "Predicted S0", (0, 10000), "predicted_S0.png")
    plot_map(data_slice[:, :, 0], "Measured S0", (0, 10000), "measured_S0.png")
    plot_map(ivimfit.perfusion_fraction, "f", (0, 1), "perfusion_fraction.png")
    plot_map(ivimfit.D_star, "D*", (0, 0.01), "perfusion_coeff.png")
    plot_map(ivimfit.D, "D", (0, 0.001), "diffusion_coeff.png")
    Heatmap of S0 predicted from the fit
    Heatmap of measured signal at bvalue = 0.
    Heatmap of perfusion fraction values predicted from the fit
    Heatmap of perfusion coefficients predicted from the fit.
    Heatmap of diffusion coefficients predicted from the fit
    References:
    [Stejskal65]Stejskal, E. O.; Tanner, J. E. (1 January 1965). “Spin Diffusion Measurements: Spin Echoes in the Presence of a Time-Dependent Field Gradient”. The Journal of Chemical Physics 42 (1): 288. Bibcode: 1965JChPh..42..288S. doi:10.1063/1.1695690.
    [LeBihan84](1, 2) Le Bihan, Denis, et al. “Separation of diffusion and perfusion in intravoxel incoherent motion MR imaging.” Radiology 168.2 (1988): 497-505.
    Example source code
    You can download the full source code of this example. This same script is also included in the dipy source distribution under the doc/examples/ directory.

    by Shahnawaz Ahmed (noreply@blogger.com) at August 03, 2016 02:50 PM

    mr-karan
    (coala)

    GSoC Week 11 Updates

    So this week I was/am busy with making the coala bears website. I had initially decided to make a simple Jekyll static website, which would display data from an external yaml file. But after talking to the community, Lasse, Tushar and me settled on for Angular JS framework, mainly because we love the filters Angular provides. I had not much experience with Angular, so over the weekend I picked it up and managed to get a fairly decent+basic functionality website ready in 2-3 days. I also discovered about ngrok through which I demoed the website to the community. I will soon deploy it somewhere else permanently once I am finished with working on the feedback I got from the community. There are some more things that need to be pushed and make it v1.0 at least so we can have it merged and released.

    I am excited that there is around a week or 10 days to submit our work. Looking back, 3 months, well time really flew by, but I learnt a lot thanks to the wonderful coala community. More on these farewell lines, but a bit later.

    Now is the time for action, not words

    Future Tasks

    • Work on Syntax Highlighting PR since it’s not merged yet.
    • Complete coala-bears website.

    Happy Coding!

    August 03, 2016 12:10 AM

    Yashu Seth
    (pgmpy)

    Linear Gaussian

    Hello, once again. With my project being in its last stages, I am wondering how much will I miss this awesome summer. But anyways I’ll spare this post from pouring the nostalgic feelings, and keep them to myself till the final post :-) .

    Today I’ll be describing the linear gaussian CPDs and the linear gaussian bayesian network.

    Linear Gaussian CPD

    A linear gaussian conditional probability distribution is defined on a continuous variable. All the parents of this variable are also continuous. The mean of this variable, is linearly dependent on the mean of its parent variables and the variance is independent.

    For example,

    P(Y ; x1, x2, x3) = N(β1x1_mu + β2x2_mu + β3*x3_mu + β0 ; σ^2)

    For its representation pgmpy will have a class named LinearGaussianCPD in the module pgmpy.factors.continuous. To instantiate an object of this class, one needs to provide a variable name, the value of the beta_0 term, the variance, a list of the parent variable names and a list of the coefficient values of the linear equation (beta_vector), where the list of parent variable names and beta_vector list is optional and defaults to None. Let me share some API with you to get a better picture.

    
    Parameters
    ----------
    
    variable: any hashable python object
        The variable whose CPD is defined.
    
    beta_0: int, float
        Represents the constant term in the linear equation.
    
    variance: int, float
        The variance of the variable defined.
    
    evidence: iterable of any hashabale python objects
        An iterable of the parents of the variable. None
        if there are no parents.
    
    beta_vector: iterable of int or float
        An iterable representing the coefficient vector of the linear equation.
    
    Examples
    --------
    
    # For P(Y| X1, X2, X3) = N(-2x1 + 3x2 + 7x3 + 0.2; 9.6)
    
    >>> cpd = LinearGaussianCPD('Y', 0.2, 9.6, ['X1', 'X2', 'X3'], [-2, 3, 7])
    >>> cpd.variable
    'Y'
    >>> cpd.variance
    9.6
    >>> cpd.evidence
    ['x1', 'x2', 'x3']
    >>> cpd.beta_vector
    [-2, 3, 7]
    >>> cpd.beta_0
    0.2
    
    

    Linear Gaussian Bayesian Network

    A Gaussian Bayesian is defined as a network all of whose variables are continuous, and where all of the CPDs are linear Gaussians. These networks are of particular interest as these are an alternate form of representaion of the Joint Gaussian distribution.

    These networks are implemented as the LinearGaussianBayesianNetwork class in the module, pgmpy.models.continuous. This class is a subclass of the BayesianModel class in pgmpy.models and will inherit most of the methods from it. It will have a special method known as to_joint_gaussian that will return an equivalent JointGuassianDistribution object for the model. Let me share the API of this method.

    
    >>> from pgmpy.models import LinearGaussianBayesianNetwork
    >>> form pgmpy.factors import LinearGaussianCPD
    >>> model = LinearGaussianBayesianNetwork([('x1', 'x2'), ('x2', 'x3')])
    >>> cpd1 = LinearGaussianCPD('x1', 1, 4)
    >>> cpd2 = LinearGaussianCPD('x2', -5, 4, ['x1'], [0.5])
    >>> cpd3 = LinearGaussianCPD('x3', 4, 3, ['x2'], [-1])
    >>> model.add_cpds(cpd1, cpd2, cpd3)
    >>> jgd = model.to_joint_gaussian()
    >>> jgd.variables
    ['x1', 'x2', 'x3']
    >>> jgd.mean
    array([[ 1. ],
           [-4.5],
           [ 8.5]])
    >>> jgd.covariance
    array([[ 4.,  2., -2.],
           [ 2.,  5., -5.],
           [-2., -5.,  8.]])
    
    

    For more details, you can refer my ongoing PR, #709.

    I hope I kept the things simple and interesting. Good Bye . I will be back soon with an another post.

    August 03, 2016 12:00 AM

    August 01, 2016

    Ravi Jain
    (MyHDL)

    Finite State Machines

    Well Its been tough couple of weeks. Proceedings in my university has caused me to slow down a bit. But i have been making progress. I was working on RxEngine Block when things got too complex and I decided to take a step back and refer to use of Finite State Machines(FSMs) to develop much more simple and readable code. As it turns out i ended up rewriting the TxEngine Block from scratch as well. In midst of all this the file system in my local repo got too clumsy as i had multiple versions of RxEngine implementation and planned to wait out for the final revision of the blocks to avoid problems with commits later on while rebasing. I shall push the latest code in a day or two for review.

    While implementing TxEngine block using FSMs I added the underrun functionality which was remaining in the previous implementation. Also I did a rough implementation of Flow Control Block which accepts request from client to send pause control frames and triggers the TxEngine for the same.

    Also i had discussion about how to provide clocks to sub-blocks and handling the reset with Josy, one of the mentors, who suggested providing clocks to sub-blocks directly in the top block as opposed to relaying them through the sub-blocks. A good reason that i can think of to support it is that, if your system is a bit big and complex it might cause problems in simulation. I shall discuss more about it in detail in upcoming blocks.


    by ravijain056 at August 01, 2016 07:39 PM

    Sheikh Araf
    (coala)

    [GSoC16] Week 10 update

    I’m at the final stage of my GSoC project, which is writing a coafile editor for the Eclipse plug-in I’ve been working on. After reading a lot of articles and tutorials I’ve finally settled on how to go about writing the editor.

    To implement the editor I’ll be extending the FormEditor class. This approach is helpful because it lets you add one or more FormPage as well as a StructuredTextEditor to view the raw text file.

    Next I’ll be using the Eclipse SWT to implement the GUI of the FormPage. It will look something like this:

    After implementing this if I have some time left I’ll also work on the bear creation wizard for the plug-in.

    August 01, 2016 01:30 PM

    July 31, 2016

    meetshah1995
    (MyHDL)

    Let's build a processor !

    We finally choose Zscale (by Berkeley Architecture Research) as the core for our project as it had a verilog implementation and was a simplistic core using RISC-V ISA, meeting all our specifications.

    The Zscale core like any other processor consisted of the standard modules found in a processor :


    • Controller 
    • ALU (Arithmetic and Logic Unit) 
    • Register File 
    • CSR File 
    • Pipeline stages
    • Immediate Generators
    • Muxes
    Currently we have ported all the modules to myHDL with tests (with the exception of one module) and we are currently assembling them to form the core. 

    Zscale is just like another processor implementation, the reason why such a complex processor can be described in hardware so compactly is because of the beautifully designed ISA. 

    As I was converting the core to myHDL, I realised the placement of each every bit in the ISA was a ingeniously planned thought which made it easy to write logic for the processor. 

    To conclude, as we near the completion of this processor, the entire myHDL community will have a  RISCV processor which supports the RV32I ( crux of the ISA ).


    See you next week,
    MS 

    by Meet Pragnesh Shah (noreply@blogger.com) at July 31, 2016 12:55 AM

    July 30, 2016

    Utkarsh
    (pgmpy)

    Monte Carlo Methods

    Monte Carlo methods is a class of methods or algorithms in which we try to approximate the numerical results using repeated random sampling. Lets us look at couple of examples to develop some intuition about Monte Carlo methods.

    The first example is about famous Monty Hall problem. For those who don’t know about the Monty Hall problem here is the statement:

    “Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?”

    There are also certain standard assumptions associated with it:

    • The host must always open a door that was not picked by the contestant.

    • The host must always open a door to reveal a goat and never the car.

    • The host must always offer the chance to switch between the originally chosen door and the remaining closed door.

    Now lets try to find out the solution of above problem using Monte Carlo Method. To find the solution using Monte Carlo Methods, we need to simulate procedure (as mentioned in statement) and calculate probabilities based on outcomes of these experiments. Don’t know about you but I’m too lazy to try simulating this experiment manually :P, so I wrote a python script which does it on my behalf ;).

    import numpy as np
    # counts the number of times we succeeded on switching
    successes_on_switch = 0
    prior_probs = np.ones(3)/3
    door = ['d1', 'd2', 'd3']
    # since door are symmetrical we can run simulation assuming we select door 1 always (without loss of generality)
    # So now host can choose only door 2 and door 3
    # Running simulation for 1000000 times
    for car_door in np.random.choice(door, size=1000000, p=prior_probs):
        # car is behind door 
        if car_door == 'd1':
            # we choose door 1 and car is behind door 1, so success with switching is zero
            successes_on_switch += 0.0
        elif car_door == 'd2':
            # we choose door 1 and car is behind door 2, monty can choose only door 3, so success on switching
            successes_on_switch += 1.0
        elif car_door == 'd3':
            # we choose door 1 and car is behind door 3, monty can choose only door 2, so success on switching
            successes_on_switch += 1.0
    success_prob_on_switch = successes_on_switch/1000000.0
    print('probability of success on switching after host has opened a door is:', success_prob_on_switch)
    

    After I ran the script I got output : probability of success on switching after host has opened a door is: 0.666325 . You might get a different output (because of randomness) but it will be approximately same. And the actual solution you get by solving conditional probabilities is which is approximately 0.6666 . As evident result is quite a good approximation of actual result.

    The next example is about approximating value of π.

    The method is simple : We choose a unit area square and circle inscribed in it. The areas of these will be in ratio π/4. Now we will generate some random points and count the number of points inside the circle and the total number of points. Ratio of the two counts is an estimate of the ratio of the two areas, which is π/4. Multiply the result by 4 to estimate π.

    Here is a python code for which does the above mentioned simulation:

    import numpy as np
    x = np.random.rand(7000000) # Taking 7000000 random points in between [0, 1), for x-coordinate
    y = np.random.rand(7000000) # Taking 7000000 random points in between [0, 1), for y-coordinate
    points_in_circle = (np.square(x) + np.square(y) <=1).sum() # points which lie in the circle x^2 + y^2 =1 (circle centred at origin with unit radius)
    pi  = 4 * points_in_circle / 7000000.0
    print(u"The approximate value of π is: ", pi)
    

    The output I got : The approximate value of π is: 3.14158742857 which is approximately equal to value of π, which is 3.14159 .

    If you observe both of the above examples, there is a nice overlapping structure to these solutions

    • First define the input type and its domain for the problem

    • Generate random numbers from the defined input domain

    • Apply deterministic operation over these numbers to get the required result

    Though the above examples are simple to solve, Monte Carlo methods are useful to obtain numerical solution to problems which are too complicated to be solved analytically. The most popular class of Monte Carlo methods are Monte Carlo approximations for integration a.k.a “Monte Carlo integration”.

    Suppose that we are trying to estimate the integral of function over some domain .

    Though these integrals can be solved analytically, and when a closed form solution does not exist, numeric integration methods can be applied. But numerical methods quickly become intractable with even small number of dimensions which are quite common in statistics. Monte Carlo Integration allows us to calculate an estimation of the value of integration .

    Assume that we have a probability density function (PDF) defined over the domain . Then we can re-write the above integration as:

    The above integration is equal to or expected value of with respect to random variable distributed according to .

    This equality is true for any PDF on D, as long as whenever . We know that we can estimate the value of by generating a number of random samples according to distribution of random variable and finding their average. As more samples are taken this value is sure to converge to expected value.

    In this way we can estimate the value of by generating a number of random samples according to p, computing f/p for each sample, and finding the average of these values. This process as described is what we call Monte Carlo Integration.

    One might be worried, what if , but probability of generating a sample at is 0, so none of our samples will cause the problem.

    We can write the above procedure into following simple steps:

    If itegration is of format

    , where is domain

    • First find volume over the domain, i.e.
    • Choose as a uniform distribution over , and draw samples,

    • Now we can approximate as:

    Lets use the above method and try approximating integral of .

    Let us define as unfiorm distribution between 0 and 1, i.e (0, 1) .

    The volume is:

    We will now draw N independent samples from this distribution, find the expectation of that value which will be our Monte Carlo approximation for .

    Here is a python code:

    import numpy as np
    N = 1000000  # Number of Samples we want to draw
    x = np.random.rand(N)  # Drawing N samples from p(x)
    Expectation = np.sum(np.exp(x*x / 2)) / N   # Taking average of those samples
    print("The Monte Carlo approximation of integration of e^{x^2/2} for limits (0, 1) is:", Expectation)
    

    The output is:The Monte Carlo approximation of integration of e^{x^2/2} for limits (0, 1) is: 1.19477498217. The actual value of integration which I calculated using WorlframAlpha is 1.19496.

    I got a more closer estimate, when I increased the sample size to 100 million: 1.1949555144469735 .

    Let us now try approximating the expected value of a Truncated normal distribution. The truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above (or both)

    The probability density function for Truncated normal distribution is defined as:

    ,

    where , and is the cummulative density function of standard normal distribution.

    Now we can approximate the expected value of the Truncated normal distribution.

    We will define as

    expected value is given by,

    and,

    Now we will draw independent samples from uniform (2, 7)

    So, we can approximate expected value as

    Here is the python code for the above procedure:

    import scipy.stats
    import numpy as np
    N = 1000000  # 1 million sample size
    x = 5*np.random.rand(N) + 2  # Sampling over uniform (2, 7)
    pdf_vals = scipy.stats.truncnorm.pdf(x, a=2,b=7,loc=3,scale=1)  # f(x; 3, 1, 2, 7)
    
    monte_carlo_expectation = 5 * np.sum(pdf_vals*x)/N
    actual_expectation = scipy.stats.truncnorm.mean(a=2,b=7,loc=3,scale=1)
    
    print("The monte carlo expectation is {} and the actual expectation of Truncated normal distribution f(x; 3, 1, 2, 7) is {}".format(
    monte_carlo_expectation, actual_expectation))
    

    The output of above code sample which I got was: The monte carlo expectation is 5.365583152790689 and the actual expectation of Truncated normal distribution f(x; 3, 1, 2, 7) is 5.373215532554829 .

    Wrapping Up

    In above examples it was easy to sample from the probability distribution directly. However, in most of the practical problems the distribution we want to sample from are far more complex. In my upcoming posts I’ll cover Markov Chain Monte Carlo, Metropolis Hasting algorithm, Hamiltonian Monte Carlo and No U Turn Sampler, which cleverly allows us to sample from sophisticated distributions.

    July 30, 2016 12:00 AM

    July 29, 2016

    Redridge
    (coala)

    Week 8 - 9 - Europython2016 and packaging

    Recap

    Week 8 - 9 - Europython2016 and packaging

    The past week I attended Europython 2016. It was the first time I was attending a tech conference so I was impressed with stuff other people might find usual or normal. I also managed to finish the packaging script for the coala-bears. Since both of those announcements come with long stories I will try to brief them in their corresponding sections. Let's start with...

    Europython 2016

    The conference was held in Bilbao (where it was held last year also), in Spain this year. I was excited to meet the coala team at last. Also, I had never been to Spain before so adding all those up, it was a promising trip for me.

    On the first day I found out 2 things about Bilbao while we were struggling to find our accommodation: it is a really beautiful city; very few people actually speak English. The later didn't matter much since we were spending most of our day at the conference venue but it was really funny to try ordering food when we were eating out in the evening.

    The Europython schedule was as follows: keynotes from 9 to 10 a.m. usually and then workshops and/or talks until lunch, after that more workshops and talks, it was awesome. I have to admit, I couldn't be there for "every" keynote because we were hacking on coala almost every night. The cover picture is taken by our mighty leader Lasse and it features a part of the coala team after lunch.

    I attended both workshops and talks but often I had to compromise because there were 5 talk tracks and 2 workshop tracks from which I could choose. Being a beginner with python myself, I learned about a lot of technologies and how to use them like tensorflow and theano for machine learning.

    All in all, it was a great experience with lots of learning, tasty food, "mostly" interesting conversations and fun group activities (real community bonding).

    Packaging

    Enough with the fun, let's talk a bit about work. I have explained in the past a bit about the need for a packaging tool for bears. The first issue that I encountered was not as trivial as it may sound: choosing the package format. Initially I wanted to go with pypi but my project focuses on enabling users to write bears with other languages, which meant that code different that python had to be packaged, so I decided (with some help from the mentors) to go with conda.

    The second issue was that I couldn't write the tool from scratch in order to avoid code duplication. For that matter I had to extend an already developed tool (for the second time in this gsoc) by another coalanian. That tool was handling packaging and uploading to pypi for every bear in the existing coala-bears repo. That way we keep bear independence if someone would want to install them separately. I could reuse some methods, for example the file generation from templates, which was fortunate for me.

    Now I can proudly announce that you can create a conda package for any bear just by pointing the tool to the bear directory. It will create all the necessary files for you and it will try to fetch the repository URL from your .git/config file (if not possible it will just prompt you for the URL).

    Wrap Up

    I am now entering in the last milestone of my Gsoc in which I will create templates for other languages with code stubs already in them (functions for creating Result objects, reading input, sending output). I am thinking initially I will try to make one for a compiled language (C++) and an interpreted one (Javascript out of the browser with node).
    After that I am going to write tutorials on how to use all the tools I developed (extended) and how to write a bear in a different language. Cya

    by Alexandros Dimos at July 29, 2016 08:25 PM

    Preetwinder
    (ScrapingHub)

    GSoC-4

    Hello,
    This post continues my updates for my work on porting frontera to python2/3 dual support.
    This blog post got delayed, sorry for that. My work on tests still continues. The last few days have mostly been focused on getting the existing tests to run on Python 3 on travis. I also made many new changes to the existed PY3 PR after feedback from my mentors, link - https://github.com/scrapinghub/frontera/pull/168 Once this PR is merged, only three tests will fail in Python 3, and frontera should run in single process mode successfully. Hopefully it will be merged in the next few days. After that the remaining work is some new tests(mainly for HBase and workers), I am already working on these, and it shouldn’t take more than two or three days. And the final job of making changes to HBase, workers, ZMQ, and encoders/decoders to work in Python 3. The challenge of this was significantly reduced by the recent dicision by my mentors to convert URL’s to ASCII representation, thus eliminating the need to worry about and storing encoding information. So it shouldn’t take long for me to cover this. I want to spend the final week on finalizing the release and making changes to the documentation.

    GSoC-4 was originally published by preetwinder at preetwinder on July 29, 2016.

    by preetwinder (you@email.com) at July 29, 2016 04:35 PM

    TaylorOshan
    (PySAL)

    Flow Associations and Spatial Autoregressive models

    In the last few weeks I had the opportunity to attend the 2016 Scipy conference with several of my mentors and contributors to the PySAL project. In this time I also completed the the three types of spatial weights for flows: network-based weights, proximity based weights using contiguity of both origins and destinations, and lastly, distance based weights using a 4-dimensiona ldistance (origin x, origin y, destination x, destination y). These three types of weights can be used within the vector-based Moran's that was coded in previous weeks to explore spatial autocorrelation, as well as within a spatial autoregressive (lag) model. In the process of building the distance-based weights, I was also able to contribute some speed-ups to the general DistancBand class, which have been incorporated into the library. Specificially, the DistanceBand class now avoids looping during construction, and there is a build_sp boolean parameter that when set to false will provide speed-ups if ones is using a relatively large threshold (or no threshold) such that the distance matrix is more dense than sparse.

    More recently, work has been focusing on developing a version of the spatial lag model where there is a spatial lag for the origins, destination and origins-destinations spatial relationships. It looks like it will be possible to extend the exisitng ml_lag.py script to estimate parameters, though the proper covariance matrix will be more involved. During last weeks meeting, my mentors and I discussed several apporaches to developing code to carry out the estimation of the covariance matrix, which is what I will continue to work on before pivoting to the final phase of the project where I will clean up the code and finish documentation.

    by Taylor Oshan at July 29, 2016 03:26 PM

    jbm950
    (PyDy)

    GSoC Week 11

    Somehow I think I was off by a week. I think last week’s blog post covers week 9 and 10 and this week’s covers week 11. This week I created a full draft for all components of the SymbolicSystem class that will take the place of a equations of motion generator “base class” that was discussed in my project proposal. I began by creating all of the docstrings for the class followed by the test code. With the documentation and test code written it was a simple matter to finish off the code for the class itself. Lastly I added documentation to two places in sympy, one place contains the autogenerated documentation from the docstrings and the other place I adapted an example from pydy to show how to use the new class.

    After working on SymbolicSystem I decided to try to finish off an old PR of mine regarding the init_printing code that Jason and I had discussed at Scipy. The idea was to build separate dictionaries to pass to the different printers in ipython based on the parameters that the specific printers take. The idea was to find this information using inspect.getargs(). The problem arose when trying to implement this solution because each separate printer has an expr argument and a **settings argument and the different possible paramters are processed internally by the printer. This means that there would not be an elegant way to build dictionaries for each printer.

    The next thing I worked on this week was looking into Jain’s version of the order(N) method as suggested last week. When I started looking over his book, however, I found that uses a rather different set of notion than Featherstone and had some additional terms. I have decided to move forward with Featherstone’s method due to the summer coming to an end and I am already familiar with his version of the method. To that end I reread the first part of chapter 7 in Featherstone’s book where he discusses the articulated body method.

    I reviewed two PR’s this week. This work was rather quick as they were simply documentation additions. I verified the method docstrings matched what the mehtod actually does and that the modual docstring included the different functions present in the file. Determining that they were correct I gave the +1 to merge and they have both since been merged.

    Future Directions

    The plan for next week is to focus entirely on the order(N) articulated body method of forming the equations of motion. I plan on writing the three passes for the method as if I have all of the information and methods I need in order to make it work. I expect this to be the best way to determine what additional code I will need in addition to finding my weak points in how well I understand the method. Once I have a skeleton of the of how the algorithm is supposed to work I will stop working directly on the algorithm itself and start working on the peripheral code such as the joints and body code or spatial vector processing methods.

    PR’s and Issues

    • (Open) [WIP] Added system.py to physics/mechanics PR #11431
    • (Merged) Added docstrings to delta and mid property methods PR #11432
    • (Merged) Added top-level docstring for singularities.py PR #11440

    July 29, 2016 12:00 AM

    July 28, 2016

    Ranveer Aggarwal
    (dipy)

    Further Progress on Orbital Menu

    Last week, we began building an orbital menu. There is now some progress, with the basic elements working well with it.
    We now have text, a button and cubes working directly with the orbital menu which work very similar to the 2D GUI.

    Building the Menu

    Using the ideas from Marc’s example, the Assembly-Follower combination was integrated into our existing code. The assembly has the object, an orbit (a disk) and of course, the parts. We allotted these cubes, positions at 360/(number of parts) angles to the X-Axis. This is how it looks like:

    The Orbital Menu

    We can also add other types of elements on the menu:

    A More Complex Orbital Menu

    The possibilities are endless! :D

    ToDo

    • Sliders currently don’t work because sliding in 3D is a bit complex.
    • There is a need to explore better methods to allot elements to the orbit.

    July 28, 2016 10:42 AM

    Shridhar Mishra
    (italian mars society)

    Ironpython and Unity

    Using Ironpython with Unity game engine.

    We already know that we can use python to use .net internal calls.
    Now we may use the same to start a console that can accept a scripting language in Unity engine.
    To do this we have to include certain dll files.
    These dll files must be present in Assets>plugins
    • IronPython.dll
    • IronPython.Modules.dll
    • Microsoft.Scripting.Core.dll
    • Microsoft.Scripting.dll
    • Microsoft.Scripting.Debugging.dll
    • Microsoft.Scripting.ExtensionAttribute.dll
    • Microsoft.Dynamic.dll

    Once the Plugins are in place.

    Initiate the Cs code
    PythonEngine engine = new PythonEngine(); engine.LoadAssembly(Assembly.GetAssembly(typeof(GameObject))); engine.ExecuteFile("Test.py");
    Where test.py is the python code.

    Initiate python side:

    import UnityEngine from UnityEngine
    import *
    Debug.Log("Hello world from IronPython!")

    More info coming soon!

    References:


    by Shridhar Mishra (noreply@blogger.com) at July 28, 2016 05:30 AM

    Aron Barreira Bordin
    (ScrapingHub)

    Scrapy-Streaming [6/7] - Scrapy With Java

    Hi,

    In these weeks, I’ve implemented the Java library to develop Scrapy’s Spider. Now, you can develop scrapy spiders easily using the scrapystreaming lib.

    About

    It’s a helper library to help the development process of external spiders in Java. It allows you to create the scrapy streaming json messages using Java objects and methods.

    Docs

    You can read the official docs here: http://gsoc2016.readthedocs.io/en/latest/java.html

    Examples

    I’ve added a few examples about it, and a quickstart section in the documentation.

    PRs

    R package PR: https://github.com/scrapy-plugins/scrapy-streaming/pull/9 Examples PR: https://github.com/scrapy-plugins/scrapy-streaming/pull/4 Docs PR: https://github.com/scrapy-plugins/scrapy-streaming/pull/7

    Thanks for reading,

    Aron.

    Scrapy-Streaming [6/7] - Scrapy With Java was originally published by Aron Bordin at GSoC 2016 on July 27, 2016.

    by Aron Bordin (aron.bordin@gmail.com) at July 28, 2016 03:54 AM

    fiona
    (MDAnalysis)

    The private life of cats

    Welcome back to another fun round of Python-wrangling (a.k.a. what I’ve learnt to do with Python during GSoC)! Today I’ll be talking about ‘private’ variables and the property attribute.

    If you’re not familiar with Python but want to follow along, you could have a look back at the brief notes I made back here, or go check out a (proper) tutorial.

    ‘Private’ and ‘public’ in Python

    Let’s say we have a House class. For many instances of House – including (of course) ours – it’ll have a cat attribute. Within House, we can interact with the cat as self.cat, so we can define the necessary methods for looking after them (feed, clean, worship, feed again…). But the cat is also accessible from outside the house (as house_instance.cat): this means anyone could come along and look at or, heaven forbid, swap our cat with something else – even something non-feline! A house’s cat is ‘public’, where we might prefer they be ‘private’.

    Many programming languages have ways of declaring variables as ‘private’ (more or less, which things can’t be used outside of the current bit of code). In Python, ‘private’ variables don’t strictly exist. Instead, we follow a convention: if we name something beginning with an underscore (say, _cat), we’re indicating that object shouldn’t be considered public and so shouldn’t be used outside of e.g. the class it’s an attribute of. People can still technically access and change our cat through our_house._cat, but we’re politely asking them not to.

    (The use of underscores isn’t just for when we don’t want people to mess with attributes, it can also be for things which don’t really serve a direct purpose to the user – say, a method that performs an intermediate step, and so isn’t useful by itself. By including the leading underscore, we can indicate to a user that they need not worry about this bit of the code, and it simplifies documentation. There are also a couple other uses for leading and trailing underscores in Python – you could see more about that in here.)

    But what if we decide we do want people to see our cat? (He is, after all, the very best cat). What we can do is make a property attribute, cat, that returns the value of _cat when we want its value but will throw an error if we try set it. We can do this using @property, the property decorator.

    A tangent on decorators

    There may arise situations when using Python where we might want to ‘modify’ a bunch of functions/methods in the same way. For example, say we’ve programmed our Python friend to take care of several cat-related chores at certain times; but as any cat owner knows, as soon as you do something for your cat, you’re probably going to need to do it again pretty soon.

    We want to modify all the cat-caring actions so instead of performing them just once at the specified time, there’s a five minute delay then the action is performed a second time. To do this, we’ll use a decorator.

    property is a built-in decorator in Python that we can apply to a ‘getter’ method (and optionally, additional ‘setter’ and ‘deleter’ methods – more on these below) to get a property attribute which, instead of having a stored value like other attributes, will return the result of the ‘getter’ method when it’s value is asked for. Above, we’re using a simple ‘getter’ that just returns the value of another attribute, but we could have something that say, returns something different depending on values of other attributes, or calculates a new value - see some of the examples below.

    Conditional setting

    Back to our House/cat situation: we’ve prevented anyone being able to come along and swap our cat for a fish, but what if we did want to let people change him, so long as we can control what he’s changed to? Obviously, we still want a cat, and let’s say we’ll also only accept the change if the proposed new cat is cuter than our current one. Having used @property to create a cat property, we can now add a ‘setter’ for cat using @cat.setter. Now, rather than throwing an error when we try house_instance.cat = new_value, this ‘setter’ method will be run; depending on new_value, there may or may not be any change.

    We could similarly use @cat.deleter to specify a ‘deleter’ to run when we use del(cat) - for example, when we move house we can use this to both remove the _cat attribute and perform clean-up tasks like removing all the cat hair or uninstalling the cat door.

    Some practical examples

    On the more practical side of things, here’s some snippets showing how I’ve been using getters, setters and ‘private’ attributes in varoius ways for my GoSC project, namely the ‘add_auxiliary’ part. Again, you can go see the full version on GitHub.

    Some brief background - the main interface when reading auxiliary data is the auxiliary reader class (there will be different reader classes for different auxiliary formats, but they all inherit from AuxReader, where all the common bits are defined); the reader has an auxstep attribute which is an ‘auxiliary step’, which stores the data and time for each step (again, there are different auxiliary step classes for different formats, with AuxStep having the common bits).


    And that’s a brief rundown of private vs. public and properties in Python - I hope you found that fun and informative!

    See you next time!

    July 28, 2016 12:00 AM

    July 27, 2016

    Adrianzatreanu
    (coala)

    Back Home

    Ah, what a wonderful trip it has been. I just came back from EuroPython2016 in Bilbao which ended 3 days ago. And I loved it. Not only have I been with wonderful people which I finally got the chance to meet in person, but also the conference was amazing!

    So the trip was cool?

    Yes. It was really good. The conference was full of amazing talks, sprints, and not to forget lightning talks! The only ones that disappointed me were workshops. Some of them were too hard for me to attend, and some on which I did were pretty bad, and it usually took us half of the time just to set up their tools and stuff, and everyone having dependency issues, lots of time wasted..

    But what about coding?

    Hehe, I coded during the conference in 7 days probably as much as I did in 14 at home. The productivity at late night coding sessions was quite high, by having people near me that were there to help me with any question. So instead of working a day on something and reworking it the next day, I could go on the best approach and do it in the first day. Cool, huh?

    GSoC progress

    I can safely say that the project is overall almost done, only the last, grunt work things yet to be done. The upload Tool works really well, and its merged. The installation tool is quite close and does what it should. What’s left? Filling in all the requirements. And this is the worst part. I now have to fill in the correct requirements for each bear. Upon doing this, I see my project done in no time:)


    by adrianzatreanu at July 27, 2016 01:16 PM

    mr-karan
    (coala)

    GSoC Week 10 Updates

    This has been a good week. I have finally been able to get some success with my Syntax Highlighting project. Before we dive into the inner details of how it unfolded, I want to put this quote here.

    “Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” - Bill Gates

    I had been working on this task on/off since almost a month now. I started with diving deep in the code-base, which is like finding a needle in a haystack. Once I got to the place where I needed to make the change, the post process wasn’t easy for me. I had to understand what different functions did in order to incorporate Syntax Highlighting. I am using Pygments library for this task. I really like the library and it has helped me a lot in making this process a lot simpler. After experimenting a lot with Pygments I got the required code in shape ready to be plugged in. I was able to get Syntax Highlighting on my terminal but I wouldn’t call it as a moment of joy as I had to do something about adding bullet marks on spaces/tabs like it was in the previous version. I somehow stumbled upon VisibleWhiteSpaceFilter which was exactly what I was looking for. Since it made my work easy, I thought to implement something extra and I added background highlighting for result.message by overriding Style from pygments.style class.

    You can see the whole thing in action here.

    The PR is currently in review but I am glad that finally it worked and my GSoC project is nearing completion. Oh and coming back to the quote in the beginning. Well, I always believe in not reinving the wheel and above I have proved how you can implement some cool stuff in as few LOC as possible.

    Future Tasks

    • Get this PR accepted.
    • Complete some more bears
    • Get started coala-bears website`

    Happy Coding!

    July 27, 2016 12:10 AM

    July 26, 2016

    Upendra Kumar
    (Core Python)

    Hello everyone. Needed Feedback for pip_tkinter.

    Hello my fellow GSoCers. Hope you all have very good time and regularly check for your blog feeds in terri.toybox.ca/python-soc/.

    I just needed feedback for my tkinter based pip GUI application in order to improve it further.  Your contribution as feedback can be very valuable and helpful to let me know what people may expect from this project. Let me tell you about this project :

    We have made a preliminary version of GUI for PIP. This project is intended to provide a GUI version for “pip”  ( target audience for the project : beginners in Python or people who are not familiar with command line).

    How to install pip_tkinter?

    Please post as many issues and suggestions here : https://github.com/upendra-k14/pip_gui/issues

    The project idea is discussed on these issues on Python Bug Tracker :
    1. Issue #23551 : IDLE to provide menu link to PIPgui.

    2. Issue #27051 : Create PIP GUI

    The GitHub repo of the project : https://github.com/upendra-k14/pip_gui/tree/dump_code


    by scorpiocoder at July 26, 2016 08:44 PM

    Pulkit Goyal
    (Mercurial)

    Iterating Dictionaries

    Dictionaries, also known as hash tables are one of the basic data structure we use in programming and hence a built-in data type in python. It is defined as a set of key-value pairs. The python library functions for reading dictionary items has undergone changes in implementation with newer versions of the language.

    July 26, 2016 12:30 PM

    Karan_Saxena
    (italian mars society)

    More updates!

    Time for updates!

    This period has been phenomenal.

    1) I am finally able to process Full HD RGBA Frame in OpenCV.
    2) Using PykinectTk, the process of estimating user movements is being done.

    More updates to follow.

    Onwards and Upwards!!

    by Karan Saxena (noreply@blogger.com) at July 26, 2016 03:16 AM

    Leland Bybee
    (Statsmodels)

    Examples

    At this point a version of the distributed estimation code is complete and it is worth spending some time detailing how it can be used through some examples. To use the distributed estimation code you need to initialize a DistributedModel instance by providing a generator to produce exog and endog for each machine. Additionally, the estimation method as well as join method need to be provided, and any corresponding arguments. The following example shows how this works for OLS using the debiasing procedure:

    from statsmodels.base.distributed_estimation import DistributedModel
    
    def _exog_gen(exog, partitions):
        """partitions exog data"""
    
        n_exog = exog.shape[0]
        n_part = np.ceil(n_exog / partitions)
    
        ii = 0
        while ii < n_exog:
            jj = int(min(ii + n_part, n_exog))
            yield exog[ii:jj, :]
            ii += int(n_part)
    
    def _endog_gen(endog, partitions):
        """partitions endog data"""
    
        n_endog = endog.shape[0]
        n_part = np.ceil(n_endog / partitions)
    
        ii = 0
        while ii < n_endog:
            jj = int(min(ii + n_part, n_endog))
            yield endog[ii:jj]
            ii += int(n_part)
    
    
    debiased_mod = DistributedModel(zip(_endog_gen(y, m), _exog_gen(X, m)), m,
                                    model_class = OLS,
                                    estimation_method=_est_debiased,
                                    join_method=_join_debiased)
    

    Note that this is actually the default for DistributedModel, so

    debiased_mod = DistributedModel(zip(_endog_gen(y, m), _exog_gen(X, m)), m)
    

    would give the same thing. Then to fit the model you just call

    debiased_params = debiased_mod.fit(fit_kwds={"alpha": 0.2})
    

    fit_kwds need to be specified (in the case for the regularization procedure), because we don’t want to constrict the fit procedures that are allowed. To get an idea of what a slightly more complicated DistributedModel might look like consider the set up for logistic regression:

    debiased_mod = DistributedModel(zip(_endog_gen(y, m), _exog_gen(X, m)), m,
                                    model_class=GLM,
                                    init_kwds={"family": Binomial()})
    

    At this point it is probably worth noting that for the examples above everything is going to be run sequentially. That means that each partition is handled in sequence. The use case here is for instances where a data set is too large to fit into memory. However, we also have support for truly distributed estimation that includes parallel computing. Currently, this is all handled through joblib. The handling of what distributed estimation is used is taken care of by the parallel_method argument to fit:

    joblib_params = mod.fit(parallel_method="joblib", fit_kwds={"alpha": 0.2})
    

    To explicitly use the sequential approach set parallel_method="sequential". One nice thing about using joblib is that it allows for some flexibility on the backend used. This means that if you have a computing cluster that is set up with something like distributed you can use that as well, for an example with distributed

    from joblib.parallel import parallel_backend, register_parallel_backend
    from distributed.joblib import DistributedBackend
    
    register_parallel_backend('distributed', DistributedBackend)
    backend = parallel_backend('distributed')
        dask_params = mod.fit(parallel_method="joblib",
                              parallel_backend = backend,
                              fit_kwds={"alpha": 0.2})
    

    To wrap up I wanted to include a couple of plots that focus on the debiasing procedure to show how it can perform compared to a naive averaging approach and against the global Lasso estimate. There are two plots, the first shows the performance in L2 for different values of N with a fixed m (number of machines) and fixed p (number of variables). The second shows the same thing but for a fixed N and variable m. When N is fixed it is 1000, when m is fixed it is 5 and when p is fixed (always) it is 100. It is also worth noting that thresholding was done on the debiased parameters as is recommended by the source paper, this also gives me example of a case where join_kwds are used:

    debiased_mod = DistributedModel(zip(_endog_gen(y, m), _exog_gen(X, m)), m,
                                    join_kwds={"threshold": 0.1})
    debiased_params = debiased_mod.fit(fit_kwds={"alpha": 0.2})
    

    Ncomp

    mcomp

    In both cases the results shown by the plots makes sense. Both the naive and debaised procedures improve as N increases, but the debiasing one converges faster to the global estimate. Similarlly both deteriorate as m increases but the naive procedure consistently does worse.

    July 26, 2016 12:00 AM

    mike1808
    (ScrapingHub)

    GSOC 2016 #4: Sneaky bug

    In this blog post I want to tell you about bug in Splash which was undiscovered almost a year.

    First meeting

    The past two weeks I was working on HTMLElement class which makes working with HTML DOM elements easier. I thought I’ve almost finished it, but when I’ve started to write tests one strange thing happened. In that test I selected several DOM elements using splash:select and did assertion of their HTML node types. I had 5 different elements: p, input, div, span and button.

    function main(splash)
        assert(splash:go(splash.args.url))
        assert(splash:wait(0.5))
        
        local p = splash:select('p')
        local input = splash:select('input')
        local div = splash:select('div')
        local span = splash:select('span')
        local button = splash:select('button')
        
        
        return {
            p=p:node_property('nodeName'):lower(),
            input=input:node_property('nodeName'):lower(),
            div=div:node_property('nodeName'):lower(),
            span=span:node_property('nodeName'):lower(),
            button=button:node_property('nodeName'):lower(),
        }
    end
    

    The weird thing started when the actual types returned by splash were p, button, button, button, button.

    As you could notice the test failed. Also, you could notice that only the first element had the correct type and the other ones have the type of the last element. To test that I’ve tried to swap some splash:select calls. The result was the same: the first value was correct and the other ones had the type of the last splash:select.

    Investigation

    After some thoughts I assumed that the issue was in some method that becomes the same (static) for the all instances of HTMLElement or _ExposedElement. I examined both classes but didn’t find any strange initialization which overrides the class methods. To confirm my thoughts I logged every splash:select and element:node_property call to see the instance on which these methods are called. It turned out that only the first and the last instances of _ExposedElement were used. So, the issue is in the function that calls these methods.

    Where those function are called? From Lua. For a moment I thought that our Lua runner (lupa) is broken (because there is some not fixed bug in it), but that idea was thrown away quickly. I wondered if this bug is in our Lua wrappers code so it must show itself somehow. For that moment the only thing that could go wrong is a return value of splash:call_later because it creates an instance of _ExposedTimer which is the only class which can be created as many times as you want (on the contrary, Splash, Response, Request and Extras class are created once during the Lua script execution). I initialized several timers and wrote a simple test to check whether my assumption about the bug was write. And it was confirmed - the bug is in our Lua wrappers, because I got the same issue with the instances of _ExposedTimer.

    I started examining methods of wraputils.lua and noticed several strange things:

    1. Metemethods are initialized on the prototype table after each setup_property_access call.

    2. In those metamethods for getters/setters we are using self, but the other properties are retrieved/assigned from/to the cls.

    So what was happening? Why the first splash:select element was working correctly and the other ones except the last one not? The answer is pretty obvious. During the first call of splash:select metamethods for Element wasn’t set and hence not called. So everything was working as it should work. However, after the first call we are setting those metamethods, so after every next call they are called when we assigning methods to the instance of Element and in the __newindex method we are setting that method to the Element. So when executing span:node_property('nodeName') it actually calls Element:node_property because of our __index metamethod.

    Solution

    After understanding why that happened the solution comes to mind very quickly: assign getters/setters to the self and call rawget and rawset on the self. Which was done in my PR.

    Conclusion

    It was very interesting bug. During the work on it I’ve learned many things about how OOP and metamethods works in Lua. I hope that I’ll meet such kind of challenging tasks in my future work with Splash.

    July 26, 2016 12:00 AM

    July 25, 2016

    Valera Likhosherstov
    (Statsmodels)

    GSoC 2016 #4

    Time Varying Parameters

    Let's consider the following process:
    Here yt is an observed process, xt is an exogenous vector, beta are so called time varying parameters that change with time, as (2) equation states. e and v are white noise terms:
    Presented model has a name of Time-Varying-Parameter model, and it was a part of my proposal. As you can see, it is non-switching, but it is used to evaluate a good start parameters for switching model likelihood optimization.
    TVP and MS-TVP models occurred to be the easiest and the most pleasant items of my proposal. Due to their simplicity I didn't have any difficulties implementing and debugging them. During MLE their parameters occurred to converge to expected values nicely, as well.

    TVP: Implementation and testing

    TVP model was implemented in upper-level statespace module (tvp.py file), rather then in the regime_switching. Implementation is a concise extension of MLEModel class. I used Kim and Nelson's (1989) modelling changing conditional variance or uncertainty in the U.S. monetary growth ([1], chapter 3.4) as a functional test and for iPython notebook demonstration.
    A special thing about TVP is that its MLE results class (TVPResults) has a plot_coefficient method, which can draw a nice plot of time varying parameters, changing with time:
     

    Heteroskedastic disturbances

    Adding heteroskedastic disturbances to observation equation (1) allows to make a model regime-switching:
    where St is a Markov regime process.

    MS-TVP: Implementation and testing

    TVP model with heteroskedastic disturbances is implemented in switching_tvp.py file of regime_switching module. It is as concise and elegant, as a non-switching analog. I'm going to implement coefficient plotting soon.
    I used Kim's (1993) Time-varying-parameter model with heteroskedastic disturbances for U.S. monetary growth uncertainty to perform functional testing. One nice thing about MS-TVP is that it finds a near-correct likelihood maximum from a non-switching start. As you can see in tests.test_switching_tvp.TestKim1993_MLEFitNonswitchingFirst class, I use 0.05% relative tolerance.

    What's next?

    The remaining part of the summer will be about improving and polishing existing models. Now I am working on adding heteroskedastic disturbances to transition equation (2). As I noted above, I have to add coefficient plotting for a switching model. Other goals are making a MS-TVP notebook demonstration and overall improvement of MS-AR model.

    Literature

    [1] "State-space Models With Regime Switching" by Chang-Jin Kim and Charles R. Nelson.

    by Valera Likhosherstov (noreply@blogger.com) at July 25, 2016 01:57 PM

    Riddhish Bhalodia
    (dipy)

    Brain Extraction Explained!

    As promised I will outline the algorithm we are following for the brain extraction using a template, which is actually a combination of elements taken from [1] and [2].

    Step 1

    Read the input data, input affine information , template data, template data mask, template affine information

    Step 2

    We perform registration of the template data onto the input data, this involves two sub steps.

    (2.a) Affine Registration

    Perform the affine registration of template onto the input and get the transformation matrix which will be used in the next step

    (2.b) Non-Linear Registration (Diffeomorphic Registration)

    Using the above affine transform matrix as the pre-align information we perform the diffeomorphic registration of the template over the input.

    These two steps gets most of it done! (this is also followed in [2])

    Step 3

    We use the transformed template and the input data to use a non-local patch similarity method for assigning labels to the input data, this part is used from [1].

    This is it! The branch for brain extraction is here

    Experiments and Results

    I am currently experimenting with NITRC IBSR data which has a manual brain extraction given with it. This will help me to validate the correctness of the algorithm.

    figure_1-9.pngOne of the brain extraction results! Looks good except for the edges

    Next Up…

    • Functional tests for the brain extraction process
    • More datasets, even the harder ones
    • Refining the code
    • Better measure for validation

    References

    [1]“BEaST:Brain extraction based on nonlocal segmentation technique”
    Simon Fristed Eskildsen, Pierrick Coupé, Vladimir Fonov, José V. Manjón, Kelvin K. Leung, Nicolas Guizard, Shafik N. Wassef, Lasse Riis Østergaard and D. Louis Collins  NeuroImage, Volume 59, Issue 3, pp. 2362–2373.
    http://dx.doi.org/10.1016/j.neuroimage.2011.09.012

    [2] “Optimized Brain Extraction for Pathological Brains (OptiBET)”
    Evan S. Lutkenhoff, Matthew Rosenberg, Jeffrey Chiang, Kunyu Zhang, John D. Pickard, Adrian M. Owen, Martin M. Monti
    December 16, 2014 PLOS http://dx.doi.org/10.1371/journal.pone.0115551


    by riddhishbgsoc2016 at July 25, 2016 08:29 AM

    Aakash Rajpal
    (italian mars society)

    Oculus working yaay

    Hey everyone, Sorry for this late post.

    I was busy setting up the Oculus, It has been a pain but at the end a sweet one :p. A week before, I was down thinking of even quitting the program . I had my code ready to run but it just wouldn’t show up on the Oculus . I was lost , but somewhere inside I knew I could do it. So I got up one last time, sat through the day tweaking my code, tweaking the Blender Game Engine , changing  configuration for Oculus and At last Bazzingaa.

    Thank God I said to myself and eventually my code was running on the Oculus:p.

    Here is a link to the DEMO VIDEO GSOC


    by aakashgsoc at July 25, 2016 08:01 AM

    Levi John Wolf
    (PySAL)

    A Post-SciPy Chicago Update

    After a bit of a whirlwind, going to SciPy and then relocating to Chicago for a bit, I figure I’ve collected enough thoughts to update on my summer of code project, as well as some of the discussion we’ve had in the library recently.

    I’ve actually seen a lot of feedback on quite a bit of my postings since my post on handling burnout as a graduate student. But, I’ve been forgetting to tag posts so that they’d show up in the GSOC aggregator! Bummer!

    The Great Divide

    Right before SciPy, a contributor suggested that it might be a reasonable idea to split the library up into independent packages. Ostensibly motivated by this conversation on twitter, the suggestion highlighted a few issues (I think) with how PySAL operates, both on a normative level, on a proecedural level, and in our code. This is an interesting suggestion, and I think it has a few very strong benefits.

    Lower Maintainence Surface

    Chief among the benefits is that minimizing the maintainence burden makes academic developers much more productive. This is something I’m actually baffled by in our current library. I understand that technical debt is hard to overcome and that some parts of the library may not exist had we started now rather than five years ago. But, it’s so much easier to swap in ecosystem-standard packages than it is to continue maintaining code that few people understand. This is also much more true when you recognize that our library does, in many places, exhibit effective use of duck typing. The barrier to us using something like pygeoif or shapely as a computational geometry core is primarily mental, and conversion of the library to drop/wrap unnecessary code in cg, weights, and core would take less than a week of full-time work. And, it’d strongly lower the maintenance footprint of the library, which I think is a central benefit of the split package suggestion.

    Clearer Academic Crediting

    Plus, the idea that splitting up the library into many, more loosely-coupled packages seems like a stroke towards the R-style ecosystem, which is exactly what the linked twitter thread suggests. But, I think that R actually has some comfy structural incentives for the drivers of its ecosystem to do what they do. Since an academic can make a barely-maintained package that does some unique statistical operation and get a Journal of Statistical Software article out of it, the academic-heavy ecosystem in R is angled towards this kind of development. And, indeed, with a very small maintainence surface, these tiny packages get shipped, placed on a CV, and occasionally updated. Thus, the social incentives align to generate a particular technical structure, something I think Hadly overstates in that brief conversation as a product of object oriented programming. While OO isn’t a perfect abstraction, I’m kind of done with blaming OO for everything I don’t like, and I think that the claim that OO encourages monolithic packages is, on its face, not a necessary conclusion. It comes down to defining efficient interfaces between classes and exposing a consistent, formal API. I don’t really think it matters whether that API is populated or driven using functions & immutable data or objects & bound methods. Closures & Objects are two sides of the same coin, really. Mostly, though, thinking that the social & technical differences in R and Python package development can be explained through quick recourse to OO vs. FP (when I bet the majority of academic package developers don’t even deeply understand OO or FP) is flippant at best. I really think more of it is the structure of academic rewards, and the predominance of academics in the R ecosystem.

    But that’s an aside. More generally, fragmenting the library would make it easier for new contributors to derive academic credit from their contributions.

    Cleaner Dependency Logic

    I think many of the library developers also feel limited by the strict adherence to a minimal set of dependencies, namely scipy and numpy. By splitting the package up into separate modules with potentially different dependency requirements, we legitimate contributors who want to provide new stuff with flashy new packages.

    To be clear, I think the way we do this right now is somewhat frustrating. If a contribution is done using only SciPy & Numpy and is sufficiently integrated into the rest of the library, it gets merged into “core” pysal. If it uses “extra” libraries but is still relevant to the project, we merge it into a module, contrib. This catch-all module contains some totally complete code from younger contributors, like the spint module for spatial interaction models or my handler module for formula-based spatial regression interfaces, as well as code from long-standing contributors, like the viz module. But, it also contains incomplete remnants of prior projects, put in contrib to make sure they weren’t forgotten. And, to make matters worse, none of the stuff in contrib is used in our continuous integration framework. So, even if an author writes test suites, they’re not run routinely, meaning that the compatibility clock is ticking every time code is committed to the module. Since it’s not unittested and documentation & quality standards aren’t the same as the code in core, it’s often easier to write from scratch when something breaks. Thus, fragmenting the package would “liberate” packages in contrib that meet standards of quality for introduction to core but have extra dependencies.

    But why is this necessary?

    Of course, we can do much of what fragmentation provides technologically using soft dependencies. At the module level, it’s actually incredibly easy. But, I have also built tooling to do this at the class/function level, and it works great. So, this particular idea about having multiple packages doesn’t solve what I think is fundamentally a social/human problem.

    The rules we’ve built around contribution do not actively support using the best tools for the job. Indeed, the social structure of two-tiered contribution, where the second tier has incredibly heterogeneous quality, intent, and no support for coverage/continuous integration testing, inhibits code reuse and magnifies not-invented-here syndrome intensely. We can’t exploit great packages like cytools, have largely avoided merging code that leverages improved computational runtimes (using numba & cython), and haven’t really (until my GSOC) programmed around pandas as a valid interaction method to the library.

    Most of the barriers to this are, as a mentioned above, mental and social, not technical. Our code can be well-architected, even though we’ve implemented special structures to do things that are more commonly (sometimes more efficiently) solved in other packages or using other techniques.

    And, there’s some freaking cool stuff going on involving PySAL. Namely, the thing that’s been animating me is its use in Carto’s Crankshaft, which integrates some PySAL tooling into a PL/Python plugin for Postgres. They’ll be exposing our API (or a subset of it) to users through this wrapper, and that feels super cool! So, we’ve got good things going for our library. But, I think that continued progress needs to address these primarily social concerns, because the code, technologically, I think is more sound than one could expect from full-time academic authors.

    July 25, 2016 06:52 AM

    July 24, 2016

    SanketDG
    (coala)

    shrox
    (Tryton)

    Refactoring

    Right now I am working on refactoring the code that I have written till now.

    I need to simplify functions, make them easier to understand and make sure that my code conforms to the standards of the Tryton and Relatario codebases. 

    July 24, 2016 02:23 AM

    mr-karan
    (coala)

    GSoC Week8,9 Updates

    The last two weeks I had been busy with making some more bears for coala. I had found

    • write-good which helps in writing good English documentation and checks the text files for common english mistakes. I really liked this tool so thought to wrap this in a linter bear and implemented WriteGoodLint Bear.

    You can see it in action here:

    • happiness, well as the name suggests, happiness is a tool which lints Js files for common syntax and semantic errors and confirms to a style which is well defined in their docs and is actually the one I like for Javascript files. It’s a fork of Standard which is another style guide, but happiness has a few better changes, so I wrapped this and implemented HappinessLintBear.

    You can see it in action here:

    • httpolice is a tool which is a linter for HTTP requests and responses. It can be used on a HAR file. If you go to Developer Tools of your browser, head over to Networks Tab, right click and save the request as HAR file. Then this tool can be used to lint that file. We didn’t have anything of this kind in coala-bears yet so I thought to wrap this and implement HTTPoliceLintBear. There have been some issues with lxml dependenices on AppVeyor and I’m figuring out how to solve so that the tests pass and this PR also gets merged.

    Future Work

    • Syntax Highlighting: There has been some clarity on how to implement this and I have till now used highlight class from Pygments but now planning to rather make a new own class ConsoleText which will help setting specific attributes to certain parts of text. Like separting the bullet marks from the string and so on. I plan to extensively work on this so I can complete this task by end of this week.

    • coala-bears website: I’ll be starting off with the prototype of this website and also some basic functionality like filtering the bears based on the parameters apart from Language they support.

    Happy Coding!

    July 24, 2016 12:10 AM

    July 23, 2016

    Yen
    (scikit-learn)

    Interactive Cython with IPython, no compilation anymore!

    Debugging Cython is sometimes very annoying because unfortunately, there aren’t many blog posts or tutorials about Cython on the Internet. We often need to learn it by a trial-and-error manners. To make things even worse, unlike Python, Cython code need to be compiled everytime after we make some changes to it, which means that it will make our debugging process more tedious.

    What if we can try out Cython in IPython notebooks, an interactive environment, without the need of compilation?

    Let’s see how this can be done.

    Ipython Notebooks

    NOTE: Feel free to skip this section if you are already familiar with it.

    At first, let’s see what IPython notebooks is.

    An IPython notebook is a powerful interactive shell, which lets you write and execute Python code in your web browser. Therefore, it is very convenient to tweak your code and execute it in bits and pieces with IPython. Besides that, it also has great support for interactive data visualization and use of GUI toolkits. For these reasons, IPython notebooks are widely used in scientific computing.

    For more installation details and tutorials, please see this site.

    Cython Problem

    Traditionally, we leverage on module distutils to compile Cython code which gives us full control over every step of the process. However, The main drawback of this approach is that it requires a separate compilation step. This is definitely a disadvantage since one of Python’s strengths is its interactive interpreter, which allows us to play around with code and test how something works before committing it to a source file

    Well, don’t worry, IPython notebook is here to save us!

    %%cython Magic

    IPython can integrage Cython flawlessly by typing some convenient commands that allow us to interactively use Cython from a live IPython session. These extra commands are IPython-specific commands called magic commands, and they start with either a single (%) or double (%%) percent sign. They provide functionality beyond what the plain Python interpreter supplies. IPython has several magic commands to allow dynamic compilation of Cython code, see here for more details.

    Before we can use these magic Cython commands, we first need to tell IPython to load them. We do that with the %load_ext metamagic command from the IPython interactive interpreter, or in an IPython notebook cell:

    In [1]: %load_ext Cython
    

    There will be no output if %load_ext is successful, and IPython will issue an error message if it cannot find the Cython-related magics.

    Great! Now we can use Cython from IPython via the %%cython magic command:

    In [2]: %%cython
            cdef int add(int x, int y):
                return x + y 
    

    The %%cython magic command allows us to write a block of Cython code directly in the IPython interpreter. After exiting the block with two returns, IPython will take the Cython code we defined, paste it into a uniquely named Cython source file, and compile it into an extension module. If compilation is successful, IPython will import everything from that module to make the function we defined available in the IPython interactive namespace. The compilation pipeline is still in effect, but it is all done for us automatically. We can now call the function we just defined:

    In [3]: add(1, 2)
    

    Cool! Now IPython will print the result of your function, i.e., 3, under this block of code.

    Generated C code

    Sometimes, it is a good practice to inspect the generated C source files to check the sanity of our program. The generated source files are located in the $IPYTHONDIR/cython directory (~/.ipython/cython on an OS X or *nix system). The module names are not easily readable because they are formed from the md5 hash of the Cython source code, but all the contents are there.

    Summary

    I really hope I would have known this convenient tips to debug my Cython code when I first knew Cython, it can really save tons of your time and efforts of your fingers :)

    Let’s run Cython code without overhead!

    July 23, 2016 11:52 PM

    Raffael_T
    (PyPy)

    Progress async and await


    It's been some time, but I made quite some progress in the new async feature of Python 3.5! There is still a bit to be done though and the end of this years Google Summer of Code is pretty close already. If I can do it in time will mostly be a luck factor, since I don't know how much I will still have to do in order for asyncio to work. The module is dependent of many new features from Python 3.3 up to 3.5 that have not been implemented in PyPy yet.

    Does async and await work already?
    Not quite. PyPy now accepts async and await though, and checks pretty much all places where it is allowed and where it is not. In other words, the parser is complete and has been tested.
    The code generator is complete as well, so the right opcodes get executed in all cases.
    The new bytecode instructions I need to handle are: GET_YIELD_FROM_ITER, GET_AWAITABLE, GET_AITER, GET_ANEXT, BEFORE_ASYNC and SETUP_ASYNC.
    These opcodes do not work with regular generators, but with coroutine objects. Those are based on generators, however they do not imlement __iter__ and __next__ and can therefore not be iterated over. Also generators and generator based coroutines (@asyncio.coroutines in asyncio) cannot yield from coroutines. [1]
    I started implementing the opcodes, but I can only finish them after asyncio is working as I need to test them constantly and can only do that with asyncio, because I am unsure what the values normally lying on the stack are. That is also valid for some functions in coroutine objects. Coroutine objects are working, however they are missing a few functions needed for the async await-syntax feature.
    These two things are the rest I have to do though, everything else is tested and should therefore work.

    What else has been done?
    Only implementing async and await would have been too easy I guess. With it comes a problem I already mentioned, and that is the missing dependencies of Python 3.3 up to 3.5.
    The module sre (offers support for regular expressions) was missing a macro named MAXGROUPS (from Python 3.3), the magic number standing for the number of constants had to be updated as well. The memoryview objects also got an update from Python 3.3 that is needed for an import. It has a function called “cast” now, which converts memoryview objects to any other predefined format.
    I just finished implementing this as well, now I am at the point where it says inside threading.py:
    _set_sentinel = _thread._set_sentinel
    AttributeError: 'module' object has no attribute '_set_sentinel'

    What to do next?
    My next goal is that asyncio works and the new opcodes are implemented. Hopefully I can write about success in my next blog post, because I am sure I will need some time to test everything afterwards.

    A developer tip for execution of asyncio in pyinteractive (--withmod)
    (I only write that as a hint because it gets easily skipped in the PyPy doc, or at least it happened to me. The PyPy team already thought about a solution for that though :) )
    Asyncio needs some modules in order to work which are by default not loaded in pyinteractive. If someone stumbles across the problem where PyPy cannot find these modules, –withmod does the trick [2]. For now, –withmod-thread and –withmod-select are required.

    [1] https://www.python.org/dev/peps/pep-0492/
    [2] http://doc.pypy.org/en/latest/getting-started-dev.html#pyinteractive-py-options


    Update (23.07.): asyncio can be imported and works! Well that went better than expected :)
    For now only the @asyncio.coroutine way of creating coroutines is working, so for example the following code would work:

    import asyncio
    @asyncio.coroutine
    def my_coroutine(seconds_to_sleep=3):
        print('my_coroutine sleeping for: {0} seconds'.format(seconds_to_sleep))
        yield from asyncio.sleep(seconds_to_sleep)
    loop = asyncio.get_event_loop()
    loop.run_until_complete(
        asyncio.gather(my_coroutine())
    )
    loop.close(

    (from http://www.giantflyingsaucer.com/blog/?p=5557)

    And to illustrate my goal of this project, here is an example of what I want to work properly:

    import asyncio

    async def coro(name, lock):
    print('coro {}: waiting for lock'.format(name))
    async with lock:
    print('coro {}: holding the lock'.format(name))
    await asyncio.sleep(1)
    print('coro {}: releasing the lock'.format(name))

    loop = asyncio.get_event_loop()
    lock = asyncio.Lock()
    coros = asyncio.gather(coro(1, lock), coro(2, lock))
    try:
    loop.run_until_complete(coros)
    finally:
    loop.close()

    (from https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-492)

    The async keyword replaces the @asyncio.coroutine, and await is written instead of yield from. "await with" and "await for" are additional features, allowing to suspend execution in "enter" and "exit" methods (= asynchronous context manager) and to iterate through asynchronous iterators respectively.

    by Raffael Tfirst (noreply@blogger.com) at July 23, 2016 03:19 PM

    Ramana.S
    (Theano)

    Second Month Blog

    Hello there,
    The work of GraphToGPU optimizer was finally merged into the master of theano, giving the bleeding edge approx 2-3 times speedup. Well, that is a huge thing. Now the compilation time for the graph on the FAST_COMPILE mode had one small block, which was created from the local_cut_gpu_transfers. The nodes introduced into the graphs were host_from_gpu(gpu_from_host(host_from_gpu(Variable))) and gpu_from_host(host_from_gpu(gpu_from_host(Variable))) patterns. This caused the slowdown of local_cut_gpu_transfers and when tried to investigate where these patterns are created, it was found to be created from one of the AbstractConv2d optimizers. We (Me and Fred) spent sometime to filter out these pattern, but we finally concluded that this speedup wouldn't help as much as the effort and dropped the idea for now.
    There were some work done in Caching the Op classes from the base Op class so that all the instances of Op don't recreate an Op instance that was already created.(The criterion being same parameter). I tried to implement the caching from Op class using a Singleton. I also verified that the instances with the same parameters are not recreated. But there are few problems which require some higher level refactoring. Currently the __call__ methods for the Op is implemented from PureOp which when making a call to the make_node, does not identify and pass all the parameters correctly. This passing parameter issue would hopefully be resolved if all the Ops in theano support __props__, which would make me convenient to access the _props_dict and pass the parameter instead of using the generalized unconventional way from *args and **kwargs. Currently, most of the Ops in the old backend does not have __props__ implemented to make use of the _props_dict. There are few road blocks to this. The instances of Elemwise would require a dict to be passed as parameter, which is of unhashable type and hence could not implement the __props__. Early of this week, work would begin on making that parameter hashable  type and hence paving way for both of this PR to get merged. Once it gets merged, there would be at least 0.5X speed up in the optimization time. 
    Finally the work has begun on implementing a CGT style optimizer. This new optimizer does optimization in topological sort. In theano, this is being implemented as a local optimizer, aimed at replacing the cannonicalize phase. Currently theano optimizes the node only "once". The main advantage of this optimizer is, it optimizes a node more than once, by trying all the possible optimizers to the node, until None of them apply. This new optimizer applies an optimization to a node, and again tries all the optimization to the newer node(the one that is modified) and so on.. There is one drawback in this approach. After two optimization being applied, the node that is being replaced wouldn't have the fgraph attribute and hence the optimization that would require this attribute could not be tried. An example of working of the new optimizer is shown below, 

    Current theano master:
    x ** 4 = T.sqr(x ** 2)

    This branch : 

    x ** 4 =  T.sqr(T.sqr(x))

    The drawback of this branch is that we won't be able to do this type of speed up for x ** 8 onwards. When profiled with the SBRNN network, the initial version of the draft seem to give approx 20sec speed up. Isn't that a good start? :D







    That's it for now folks! :)

    by Ramana Subramanyam (noreply@blogger.com) at July 23, 2016 08:26 AM

    July 22, 2016

    Avishkar Gupta
    (ScrapingHub)

    Formalising the Benchmark Suite, Some More Unit Tests and Backward Compatibility Changes

    In the past two weeks I focused my efforts on finalizing the benchmarking suite and improving test coverage.

    From what Codecov says, we’re 83% of the way there regarding test coverage. As far as the performance of the new signals is concerned, from what the testing shows I gathered that the new signal API always takes less than half the time that is required by the old signal API for both signal connection and the actual sending of the signal.

    This is attributed mostly to the fact that a lot of time that was previously used up by running a combo of the getAllReceivers and liveReceivers functions together everytime was taking up a huge amount of time and was the bottleneck to the process. As it currently stands, we’re not using the caching mechanism of the library, i.e. have use_caching set to false always because the receivers which do not connect to a specific sender but rather to all require me to find a suitable key for them that can be weakref ref’d to make the entry in the WeakKeyDictionary. But enough about that, back to benchmarking.

    So for the benchmarking process, Djangobech the Django benchmarking library, does not benchmark the signals currently and the same is still on the TODO list in the project. They however, did provide me with some excellent modules that I used to write the scrapy benchmarking suite for signals. I would leave a link to the same here, but currently I’m in a discussion with my mentor on where to include them, as including them in the repo would require that we still keep pyDispatcher as a dependency as it is required to perform a raw apples to apples comparison of the signal code. In this post I’m also sharing results that I got using Robert Kern’s line_profiler module.

    line profiler output.

    As for the compatibility changes this cycle, I added support for the old style scrapy signals, which were just standard python objects. In similar fashion to how I implemented backward compatiblity for receivers without keyword arguments, I proxied the signals through the signal manager to implement backward compatability for the objects. With that the new signals can be safely integrated into scrapy with no worries about breaking legacy code. In the coming weeks, I plan on working on finishing test coverage, maybe adding some signal benchmarks to scrapy bench and doing documentation.

    July 22, 2016 11:00 PM

    Nelson Liu
    (scikit-learn)

    (GSoC Week 8) MAE PR #6667 Reflection: 15x speedup from beginning to end

    If you've been following this blog, you'll notice that I've been talking a lot about the weighted median problem, as it is intricately related to optimizing the mean absolute error (MAE) impurity criterion. The scikit-learn pull request I was working on to add aforementioned criterion to the DecisionTreeRegressor, PR #6667, has received approval from several reviewers for merging. Now that the work for this PR is complete, I figure that it's an apt time to present a narrative of the many iterations it took to converge to our current solution for the problem.

    Iteration One: Naive Sorting

    The Criterion object that is the superclass of MAE has a variety of responsibilities during the process of decision tree construction, primarily evaluating the impurity of the current node, and evaluating the impurity of all the possible children to find the best next split. In the first iteration, every time we wanted to calculate the impurity of a set of samples (either a node, or a possible child), we would sort this set of samples and extract the median from it.
    After implementing this, I ran some benchmarks to see how fast it was compared to the Mean Squared Error (MSE) criterion currently implemented in the library. I used both the classic Boston housing price dataset and a larger, synthetic dataset with 1000 samples and 100 features each to compare. Training was done on 0.75 of the total dataset, and the other 0.25 was used as a held-out test set for evaluation.

    Boston Housing Dataset Benchmarks: Iter. 1

    MSE time: 105 function calls in 0.004 seconds  
    MAE time:  105 function calls in 0.175 seconds
    
    Mean Squared Error of Tree Trained w/ MSE Criterion: 32.257480315  
    Mean Squared Error of Tree Trained w/ MAE Criterion: 29.117480315
    
    Mean Absolute Error of Tree Trained w/ MSE Criterion: 3.50551181102  
    Mean Absolute Error of Tree Trained w/ MAE Criterion: 3.36220472441  
    

    Synthetic Dataset Benchmarks: Iter. 1

    MSE time: 105 function calls in 0.089 seconds  
    MAE time:  105 function calls in 15.419 seconds
    
    Mean Squared Error of Tree Trained w/ MSE Criterion: 0.702881265958  
    Mean Squared Error of Tree Trained w/ MAE Criterion: 0.66665916831
    
    Mean Absolute Error of Tree Trained w/ MSE Criterion: 0.650976429446  
    Mean Absolute Error of Tree Trained w/ MAE Criterion: 0.657671579992  
    

    This sounds reasonable enough, but we quickly discovered after looking at the numbers that it was intractable; while sorting is quite fast in general, sorting in the process of finding the children was completely unrealistic. For a sample set of size n, we would divide it into n-1 partitions of left and right child, and sort each one, on every node. The larger dataset made MSE take 22.25x more time, but it made MAE take 88.11x (!) slower. This result was obviously unacceptable, so we began thinking of how to optimize; this led us to our second development iteration.

    Iteration 2: MinHeap to Calculate Weighted Median

    In iteration two, we implemented the algorithm / methodology I discussed in my week 6 blog post. With the method, we did away with the time associated with sorting every sample set for every node and possible child and instead "saved" sorts, using a modified bubblesort to insert and remove new elements from the left and right child heaps efficiently. This algorithm had a substantial impact on the code --- rerunning the benchmarks we used earlier yielded the following results (MSE results remained largely the same due to run-by-run variation, but accuracy is the same as is thus omitted):

    Boston Housing Dataset Benchmarks: Iter. 2

    MSE time: 105 function calls in 0.004s (was: 0.004s)  
    MAE time:  105 function calls in 0.276s (was: 0.175s)  
    

    Synthetic Dataset Benchmarks: Iter. 2

    MSE time: 105 function calls in 0.065s (was: 0.089s)  
    MAE time:  105 function calls in 5.952s (was: 15.419s)  
    

    After this iteration, MAE is still quite slower than MSE, but it's a definite improvement from naive sorting (especially when using a large dataset). I found it interesting that the new method is actually a little bit slower than the naive method we first implemented on the relatively small Boston dataset (0.276s vs 0.175s, respectively). My mentors and I hypothesized that this might be due to the time cost associated with creating the WeightedMedianCalculators (the objects that handled the new median calculation), though their efficiency in calculation is supported by the speed increase from 15.419s to 5.952s on the larger randomly generated dataset. 5.952 seconds on a dataset with 1000 samples is still slow though, so we kept going.

    Iteration 3: Pre-allocation of objects

    We suspected that there could be a high cost associated with spinning up objects used to calculate the weighted median. This is especially important because the majority of the tree code in scikit-learn is written in Cython, which disallows us of Python objects and functions. This is because we run the Cython code without the Python GIL (global interpreter lock). The GIL is a mutex that prevents multiple native threads from executing Python bytecodes at once, so running without the GIL makes our code a lot faster. However, because our WeightedMedianCalculators are Python objects, we unfortunately need to reacquire the GIL to instantiate them. We predicted that this could be a major source of the bottleneck. As a result, I implemented a reset function in the objects to clear them back to their state at construction, which could be executed without the GIL. When we first ran the C-level constructor (it is run at every node, as opposed to the Python constructor that is run only once), we evaluated whether the WeightedMedianCalculators had been created or not; if they have not been, we reacquire the GIL and create them. If they have, we simply reset them. This allowed us to only reacquire the GIL once throughout the algorithm, which, as predicted, led to substantial speedups. Running the benchmarks again displayed:

    Boston Housing Dataset Benchmarks: Iter. 3

    MSE time: 105 function calls in 0.009s (was: 0.004s, 0.004s)  
    MAE time:  105 function calls in 0.038s (was: 0.276s, 0.175s)  
    

    Synthetic Dataset Benchmarks: Iter. 3

    MSE time: 105 function calls in 0.065s (was: 0.065s, 0.089s)  
    MAE time:  105 function calls in 0.978s (was: 5.952s, 15.419s)  
    

    Based on the speed improvement from the most recent changes, it's reasonable to conclude that a large amount of time was spent re-acquiring the GIL. With this approach, we cut down the time spent reacquiring the GIL by quite a significant amount since we only need to do it once, but ideally we'd like to do it zero times. This led us to our third iteration.

    Iteration 4: Never Re-acquire the GIL

    Constructing the WeightedMedianCalculators requires two pieces of information, n_outputs (the number of outputs to predict) and n_node_samples (the number of samples in this node). We need to create a WeightedMedianCalculator for each output to predict, and the internal size of each should be equal to n_node_samples.
    We first considered whether we could allocate the WeightedMedianCalculators at the Splitter level (the splitter is in charge of finding the best splits, and uses the Criterion to do so). In splitter.pyx, the __cinit__ function (Python-level constructor) only exposes the value of n_node_samples and we lack the value of n_outputs. The opposite case is true in criterion.pyx, where the __cinit__ function is only shown the value of n_outputs and does not get n_node_samples until C-level init time, hence why we previously were constructing the WeightedMedianHeaps in the init function and cannot completely do it in __cinit__. If we could do it completely in the __cinit__, we would not have to reacquire the GIL because the __cinit__ operates on the Python level in the first place.
    As a result, we simply modified the __cinit__ of the Criterion objects to expose the value of n_node_samples, allowing us to do all of the allocation of the objects at the Python-level without having to specifically reacquire the GIL. We reran the benchmarks on this, and saw minor improvements in the results:

    Boston Housing Dataset Benchmarks: Iter. 4

    MSE time: 105 function calls in 0.003s (was: 0.009s, 0.004s, 0.004s)  
    MAE time:  105 function calls in 0.032s (was: 0.038s, 0.276s, 0.175s)  
    

    Synthetic Dataset Benchmarks: Iter. 4

    MSE time: 105 function calls in 0.065s (was: 0.065s, 0.065s, 0.089s)  
    MAE time:  105 function calls in 0.961s (was: 0.978s, 5.952s, 15.419s)  
    

    Conclusion

    So after these four iterations, we managed to get a respectable 15x speed improvement. There's still a lot of work to be done, especially with regards to speed on larger datasets; however, as my mentor Jacob commented, "Perfect is the enemy of good", and those enhancements will come in future (very near future) pull requests.

    If you have any questions, comments, or suggestions, you're welcome to leave a comment below.

    Thanks to my mentors Raghav RV and Jacob Schreiber for their input on this problem; we've run through several solutions together, and they are always quick to point out errors and suggest improvements.

    You're awesome for reading this! Feel free to follow me on GitHub if you want to track the progress of my Summer of Code project, or subscribe to blog updates via email.

    by Nelson Liu at July 22, 2016 10:53 PM

    aleks_
    (Statsmodels)

    Bugs, where art thou?

    The latest few weeks were all about searching for bugs. The two main bugs (both related to the estimation of parameters) showed up in two cases:
    • When including seasonal terms and a constant deterministic term in the vector error correction model (VECM), the estimation for the constant term differed from the one produced by the reference software JMulTi which is written by Lütkepohl. Interestingly, my results did equal those printed in the reference book (also written by Lütkepohl) so I believe that JMulTi has gotten a corresponding update between the release of the book and the release of the software - which would also mean that the author of the reference book made the same mistake as I ;) Basically the error was the result of a wrong construction of the matrix holding the dummy variables. Instead of the following pattern (assuming four seasons, e.g. quaterly data):
    [[1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, ..., 1, 0, 0, 0], 
     [0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, ..., 0, 1, 0, 0],
     [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, ..., 0, 0, 1, 0]]
    • 1 / (number of seasons) had to be subtracted from each element in the matrix above. This isn't described in the book and it is not the way to define dummy variables I learned in my lecture about regression analysis. And due to the fact that the estimation for the seasonal parameters was actually correct (the wrong matrix had only side effects on constant terms), I kept searching the bug in a lot of different places...
    • A small deviation from certain parameters occurred when deterministic linear trends were assumed to be present. Thanks to an R package handling VECM, called tsDyn, I could see that my results exactly matched those produced by tsDyn when specifying linear trends to be inside the cointegration relation in the R-package. On the other hand the tsDyn output equaled that of JMulTi when tsDyn did not treat the linear trend as part of the cointegration relation. After I had seen that, the reimplementation to produce the JMulTi output in Python was easy. But also here I had searched a lot for bugs before.
    Now I am happy that the code works even if the bugs have thrown me behind the time schedule. I expect that coding will now continue much more smoothly.
    The only thing that makes me worry is the impression the bug hunting made on my supervisors. Being unable to push code while searching for bugs may have looked as if I am not doing anything though I spent hours and hours reading my code and checking it against Lütkepohl's book. So while I was working much more than 40 hours per week in the last few weeks  (once even without a single day off), it may have looked completely different.
    What counts now is that I will continue to give my best in the remaining weeks of GSoC even if I don't get a passing grade from my supervisors. After all, it's not about the money, it's about being proud of the end product and about knowing that one has given his very best : )

    by yogabonito (noreply@blogger.com) at July 22, 2016 08:32 PM

    Upendra Kumar
    (Core Python)

    Creating documentation with Sphinx

    This week I worked on creating documetation and creating a web crawler for my new feature of providing users with option of installing packages from PythonLibs.

    It’s really a great tool for creating docs. In just few steps I could create docs as compared to creating a whole website based on django or using static webpages on github.

    We can create docs in few steps :

    1. mkdir docs
    2. Go to docs directory and run
      sphinx-quickstart
    3. Navigate to
      docs/source/conf.py

      and change :

      sys.path.insert(0, os.path.abspath('../..'))
    4. Now run
      sphinx-apidoc -f -o source/ ../mypackage/
    5. Our directory structure should be like this :
      myproject/
      |-- README
      |-- setup.py
      |-- myvirtualenv/
      |-- mypackage/
      |   |-- __init__.py
      |   `-- mymodule.py
      `-- docs/
          |-- MakeFile
          |-- build/
          `-- source/
      
    6. Finally run the following command to create html files from .rst files
      make html

    We can also play with MakeFile to configure the settings based on our preference.


    by scorpiocoder at July 22, 2016 06:05 PM

    ghoshbishakh
    (dipy)

    Google Summer of Code Progress July 22

    It has been about 3 weeks after the midterm evaluations. The dipy website is gradually heading towards completion! dipy home page screenshot

    Progress so far

    The documentation has been completely integrated with the website and it is synced automatically from the github repository where the docs are hosted.

    The honeycomb gallery in the home page is replaced with a carousal of images with content overlays that will allow us to display important announcements at the top.

    The news feed now has sharing options for Facebook, Google Plus and Twitter.

    Google analytics has been integrated for monitoring traffic.

    There are many performance optimizations like introducing a layer of cache and enabling GZipping middleware. Now the google page speed score is even higher than the older website of dipy.

    dipy page speed

    All pages of the website has meta tags for search engine optimizations.

    And of course there has been lots of bug fixes and the website scales a lot better in mobile devices.

    The current pull request is #13

    You can visit the site under development at http://dipy.herokuapp.com/

    Documentation Integration

    The documentations are now generated and uploaded to the dipy_web repository using a script. Previously html version of the docs were built, but this script builds the json docs that allows us to integrate them within the django templates vrey easily. Then using github API the list of documentations are synced with the django website models.

    def update_documentations():
        """
        Check list of documentations from gh-pages branches of the dipy_web
        repository and update the database (DocumentationLink model).
    
        To change the url of the repository in which the documentations will be
        hosted change the DOCUMENTATION_REPO_OWNER and DOCUMENTATION_REPO_NAME
        in settings.py
        """
        url = "https://api.github.com/repos/%s/%s/contents/?ref=gh-pages" % (
            settings.DOCUMENTATION_REPO_OWNER, settings.DOCUMENTATION_REPO_NAME)
        base_url = "http://%s.github.io/%s/" % (
            settings.DOCUMENTATION_REPO_OWNER, settings.DOCUMENTATION_REPO_NAME)
        response = requests.get(url)
        response_json = response.json()
        all_versions_in_github = []
    
        # add new docs to database
        for content in response_json:
            if content["type"] == "dir":
                version_name = content["name"]
                all_versions_in_github.append(version_name)
                page_url = base_url + version_name
                try:
                    DocumentationLink.objects.get(version=version_name)
                except ObjectDoesNotExist:
                    d = DocumentationLink(version=version_name,
                                          url=page_url)
                    d.save()
        all_doc_links = DocumentationLink.objects.all()
    
        # remove deleted docs from database
        for doc in all_doc_links:
            if doc.version not in all_versions_in_github:
                doc.delete()

    Now the admins with proper permissions can select which documentation versions to display in the website. Those selected documentations are displayed in the navbar dropdown menu. This is done by passing the selected docs in the context in a context preprocessor.

    def nav_pages_processor(request):
        pages = WebsiteSection.objects.filter(section_type="page",
                                              show_in_nav=True)
        all_doc_displayed = DocumentationLink.objects.filter(displayed=True)
        return {'pages_in_nav': pages, 'all_doc_displayed': all_doc_displayed}

    Now when a user requests a documentation, the doc in json format is retrieved from github parsed and the urls in the docs are processed so that they work properly within the django site. Then the docs are rendered in a template.

    @cache_page(60 * 30)  # cache the view for 30 minutes
    def documentation(request, version, path):
        context = {}
        repo_info = (settings.DOCUMENTATION_REPO_OWNER,
                     settings.DOCUMENTATION_REPO_NAME)
        base_url = "http://%s.github.io/%s/" % repo_info
        url = base_url + version + "/" + path + ".fjson"
        response = requests.get(url)
        if response.status_code == 404:
            url = base_url + version + "/" + path + "/index.fjson"
            response = requests.get(url)
            if response.status_code == 404:
                raise Http404("Page not found")
        url_dir = url
        if url_dir[-1] != "/":
            url_dir += "/"
        response_json = response.json()
        response_json['body'] = response_json['body'].replace("src=\"",
                                                              "src=\"" + url_dir)
        page_title = "DIPY : Docs %s - %s" % (version,
                                              strip_tags(response_json['title']),)
        context['meta'] = get_meta_tags_dict(title=page_title)
        context['doc'] = response_json
        return render(request, 'website/documentation_page.html', context)

    dipy documentation page screenshot

    dipy documentation tutorial page screenshot

    Cache

    Processing the json documentations every time a page is requested is an overhead. Also in the home page, every time the social network feeds are fetched which is not required. So a cache is used to reduce the overhead. In django adding a cache is really really simple. All we need to do is setup the cache settings and add some decorators to the views.

    CACHES = {
        'default': {
            'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
            'LOCATION': 'dipy-cache',
        }
    }

    For now we are using local memory cache, but in production it will be replaced with memcached.

    We are keeping the documentation view and the main page view in cache for 30 minutes.

    @cache_page(60 * 30)  # cache the view for 30 minutes
    def documentation(request, version, path):
        context = {}
        repo_info = (settings.DOCUMENTATION_REPO_OWNER,
                     settings.DOCUMENTATION_REPO_NAME)
        base_url = "http://%s.github.io/%s/" % repo_info
      .... .... ... ..

    But this creates a problem. When we change some section or news or publications in the admin panel then the changes are not reflected in the views and we need to wait for 30 minutes to see the changes. In order to solve the issue the cache is cleared whenever some changes are made to the sections, news etc.

    class NewsPost(models.Model):
        title = models.CharField(max_length=200)
        body_markdown = models.TextField()
        body_html = models.TextField(editable=False)
        description = models.CharField(max_length=140)
        post_date = models.DateTimeField(default=timezone.now)
        created = models.DateTimeField(editable=False, auto_now_add=True)
        modified = models.DateTimeField(editable=False, auto_now_add=True)
    
        def save(self, *args, **kwargs):
            html_content = markdown.markdown(self.body_markdown,
                                             extensions=['codehilite'])
            print(html_content)
            # bleach is used to filter html tags like <script> for security
            self.body_html = bleach.clean(html_content, allowed_html_tags,
                                          allowed_attrs)
            self.modified = datetime.datetime.now()
    
            # clear the cache
            cache.clear()
    
            # Call the "real" save() method.
            super(NewsPost, self).save(*args, **kwargs)
    
        def __str__(self):
            return self.title

    Search Engine Optimizations

    One of the most important steps for SEO is adding proper meta tags in every page of the webiste. These also include the open graph tags and the twitter card tags so that when a page is shared in a social network, it is properly rendered with the correct title, description, thumbnail etc.

    The django-meta app provides vrey useful template that can be included to render the meta tags properly provided a meta object is passed in the context. Ideally all pages should have its unique meta tags, but there must be a fallback so that if no meta attributes are specified then some default values are used.

    So in order to generate the meta objects we have this function:

    def get_meta_tags_dict(title=settings.DEFAULT_TITLE,
                           description=settings.DEFAULT_DESCRIPTION,
                           keywords=settings.DEFAULT_KEYWORDS,
                           url="/", image=settings.DEFAULT_LOGO_URL,
                           object_type="website"):
        """
        Get meta data dictionary for a page
    
        Parameters
        ----------
        title : string
            The title of the page used in og:title, twitter:title, <title> tag etc.
        description : string
            Description used in description meta tag as well as the
            og:description and twitter:description property.
        keywords : list
            List of keywords related to the page
        url : string
            Full or partial url of the page
        image : string
            Full or partial url of an image
        object_type : string
            Used for the og:type property.
        """
        meta = Meta(title=title,
                    description=description,
                    keywords=keywords + settings.DEFAULT_KEYWORDS,
                    url=url,
                    image=image,
                    object_type=object_type,
                    use_og=True, use_twitter=True, use_facebook=True,
                    use_googleplus=True, use_title_tag=True)
        return meta

    And in settings.py we can specify some default values:

    # default meta information
    DEFAULT_TITLE = "DIPY - Diffusion Imaging In Python"
    DEFAULT_DESCRIPTION = """Dipy is a free and open source software
                          project for computational neuroanatomy,
                          focusing mainly on diffusion magnetic resonance
                          imaging (dMRI) analysis. It implements a broad
                          range of algorithms for denoising,
                          registration, reconstruction, tracking,
                          clustering, visualization, and statistical
                          analysis of MRI data."""
    DEFAULT_LOGO_URL = "http://dipy.herokuapp.com/static/images/dipy-thumb.jpg"
    DEFAULT_KEYWORDS = ['DIPY', 'MRI', 'Diffusion Imaging In Python']
    
    # django-meta settings
    META_SITE_PROTOCOL = 'https'
    META_SITE_DOMAIN = 'dipy.herokuapp.com'

    dipy SEO share page screenshot

    Googe Analytics

    Adding google analytics is very simple. All we need to do is put a code snippet in every template or just the base template that is extended by all other templates. But in order to make it more easy to customize, I have kept it as a context preprocessor that will take the Tracking ID from settings.py and generate the code snippet in the templates.

    def google_analytics_processor(request):
        tracking_id = settings.GOOGLE_ANALYTICS_TRACKING_ID
        tracking_code = """<script>
          (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
          (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
          m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
          })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
    
          ga('create', '%s', 'auto');
          ga('send', 'pageview');
    
        </script>""" % (tracking_id,)
        return {'google_analytics': tracking_code}

    What’s next

    We have to add more documentation versions (the older ones) and add a hover button in the documentation pages to hop from one documentation version to another just like the django documentations.

    We have to design a gallery page that will contain images, videos and tutorials.

    I am currently working on a github data visualization page for visualization of dipy contributors and activity in the dipy repository.

    Will be back with more updates soon! :)


    July 22, 2016 05:10 PM

    liscju
    (Mercurial)

    Coding Period - VII-VIII Week

    The main thing i managed to do in the last two weeks was to make new clients(with redirection feature) put files in redirection place by themselves instead of pushing to the main repository. The new client in order to obtain the redirection destination asks main repo server for this information and later it communicates directly with the redirection server. This behaviour is proper because it is sure that transaction of push will only suceed when the client succesfully put all large files in redirection destination. In other case the transaction will fail, so main server repo will not have any revision that large files are not in redirection destination.

    Second thing i have done was to add/tweak current test cases for redirection module.

    The next thing i have done was to research what functionalities redirection server should have. There were some discussion if server should be thin/rich in functionalities but the general conclusion is that it should be thin - it should support only getting files and pushing files. The one thing we demand from redirection server that it should check if pushed large file has proper hash because it is the only way to be sure that consecutives clients will download files with proper content.

    The last thing i managed to do was to make putting files by old clients not saving those files temporarily before sending to redirection server. So far those files was saved because when old clients pushes files, the main repo server doesn't know the size of the file and in the result he doesn't know how to set Content-Length in the request to the redirection server. This was overcome by using Chunked Transfer Encoding. This functionality of the http 1.1 protocol enables sending files chunk by chunk knowing only single chunk size that is sent. You can read more about this on wikipedia:

    https://en.wikipedia.org/wiki/Chunked_transfer_encoding

    by Piotr Listkiewicz (noreply@blogger.com) at July 22, 2016 04:09 PM

    Abhay Raizada
    (coala)

    week full of refactor

    My Project has grown a lot now, we are officially going to support  C, C++, python3, JS, CSS and JAVA with our generic algorithms, though they’ll still be experimental owing to the nature of the bears.

    The past two weeks were heavily concentrating on refactoring algorithms of the AnnotationBear and IndentationBear, the IndentationBear received only small fixes while the AnnotationBear had to undergo a change in the algorithm, the new and improved algorithm also adds the feature of distinguishing between single-line and multi-line strings while earlier there were just strings.

    The IndentationBear is almost close to completion barring basic things like:

    • It still messes up your doc strings/ multi-line strings.
    • Still no support for keyword indents.

    the next weeks efforts will go into introducing various indentation styles into the bear and fixing these issues, before we move on to the LineBreakBear and the FormatCodeBear.


    by abhsag at July 22, 2016 03:02 PM

    Prayash Mohapatra
    (Tryton)

    Few methods left

    Well yes, according to my Trello board, I am just a couple of methods away from completely porting the Import/Export feature from Python (GTK) to JavaScript (sao). The journey now feels rewarding, especially since I just learnt that GNU Health uses Tryton as their framework too.

    There has been no problem as such since last two weeks. Made the predefined exports be used, created, saved and removed. Can now get the records selected in the tab and could fetch the relevant data from the ‘export_data’ RPC call. Got confidence in making the RPC calls in general.

    Feeling comfortable around promises. Now I smile at the times when the folks at my college club would use promise for every concurrency issue, and I would be staring at them poker-faced.

    Would soon move into writing the tests for the feature, something I am waiting eagerly for. Have a nice weekend.

    July 22, 2016 11:30 AM

    Ravi Jain
    (MyHDL)

    Started Receive Engine!

    Its been a long time since my last post!(2 weeks phew)! Sorry for the slump. Anyways During the period i successfully merged Transmit Engine after mentor’s review. I later realised that i missed adding functionality of client underrun used to corrupt current frame transmission. I shall make sure to add that in next merge.

    Next I started looking towards GMII, which partly stalled my work cause i was unable to clearly understand what I have to do for that. So I decided to move on and complete Receive Engine with Address Filter First. Till now i have finished receiving the destination address from the data stream and filtering using the address table by matching it against frame’s destination address. If there is any match, the receiver starts forwarding the stream to client side, otherwise just ignores it.

    Next i look forward to add error check functionalities to be able to assert Good/Bad Frame at the end of the transmission.


    by ravijain056 at July 22, 2016 06:52 AM

    jbm950
    (PyDy)

    GSoC Week 8 & 9

    Last week I did not end up writing a blog post and so I am combining that week’s post with this week. Last week I attended the SciPy 2016 conference and was able to meet my mentor, and many other contributers to SymPy, in person. I was also able to help out with the Pydy tutorial. During this time at the conference (and this current week) I was able to flesh out the remaining details on the different portions of the project. I have updated PR #353 to reflect the api decisions for SymbolicSystem (previously eombase.EOM).

    In line with trying to put the finishing touches on implementation details before diving in to code, Jason and I met with someone who has actually implemented the algorithm in the past to help us with details surrounding Featherstone’s method. He also pointed me to a different description of the same algorithm that may be easier to implement.

    This week I also worked on rewriting the docstrings in physics/mechanics/body.py because I found the docstrings currently there to be somewhat confusing. I also did a review on one of Jason’s PR’s where he reduces the amount of work that *method.rhs() has to do when inverting the mass matrix by pulling out the kinematical information before the inversion takes place.

    Future Directions

    With the work these past two weeks being focused on implementing the different parts of the projects, I will start implementing these various parts next week. I will first work on finishing off the SymbolicSystem object and then move towards implementing the OrderNMethod. This work should be very straight forward with all the work that has been put into planning the api’s.

    PR’s and Issues

    • (Merged) Speeds up the linear system solve in KanesMethod.rhs() PR #10965
    • (Open) Docstring cleanup of physics/mechanics/body.py PR #11416
    • (Open) [WIP] Created a basis on which to discuss EOM class PR #353

    July 22, 2016 12:00 AM

    July 21, 2016

    srivatsan_r
    (MyHDL)

    Clarity on the Project

    It has been a long time since I have updated my blogs. I posted a block diagram in the previous post saying that this is what I will be doing next. My mentor then told me that completing and making the RISC-V core functional will itself take a lot of time, so video streaming is not required at the moment.

    So, for the rest of my GSoC period I will be working on the RISC-V core. I will be doing the project along with another GSoC participant. After reviewing a lot of implementations of RISC-V my partner and our mentor chose the V-Scale RISC-V Processor core (Which is a verily version of Z-Scale RISC-V processor).

    My partner has already completed the Decoder of the processor during the first half of GSoC. He was getting a Return type mismatch error when trying to convert the decoder module to verilog. We couldn’t figure out why this error was coming. Then after reviewing the code carefully I was able to spot the error in the code.

    def fun(value, sig):
        if value == 0:
            return sig[4:2]
        else:
            return sig[5:2]

    The above given function code cannot be converted to verilog. This is because the length of intbv returned in the function varies at each branch of if-else. This cannot be modelled by a verilog function which should have a definite length for the return type, and hence the error.

    To solve this error I had to inline the code and remove the function. After rectifying this error the code was converting to verilog correctly. I then made a pull request to the dev branch of the repo meetshah1995/riscv.

    The code for ALU was ported to MyHDL by my partner and I created some tests for the module. While doing this I learnt some basic stuff like in verilog ‘>>’ denotes logical right shift and ‘>>>’ denotes arithmetic right shift, but in python ‘>>’ denotes arithmetic right shift. There is a catch here in MyHDL –

    a >> 2 # If a is of type Signal/intbv '>>' works as logical right shift.
    a >> 2 # If a is of type int '>>' works as arithmetic right shift.

    That is an interesting fact, isn’t it?

    UPDATE :

    When using MyHDL, the arithmetic vs logical shift is determined with the sign’ness of the Signal. If it is unsigned Logical right shift happens and when signed Arithmetic right shift happens.


    by rsrivatsan at July 21, 2016 06:25 PM

    Ranveer Aggarwal
    (dipy)

    Going 3D: An Orbital Menu

    The next UI element is a menu with items that circle around a 3D object while still facing the camera.
    Basically, we need a menu that follows a 3D object. If the 3D object is moved, the menu should follow it, while still facing the camera. For now, we’d like the elements of the menu to be arranged in a circle.

    Understanding the Follower Menu

    A vtkFollower inherits from vtkActor and takes in the renderer’s active camera as an attribute. And that is it! That’s all that’s required to get an element to follow the camera.
    This is how a vtkFollower looks like:

    # mapper = ...
    followActor = vtk.vtkFollower()
    followActor.SetMapper(mapper)
    followActor.SetCamera(renderer.GetActiveCamera())
    

    Building an Assembly

    Now, if we keep adding actors like this, we’ll run into a problem, that is, all of them will face the camera, but since their origins are separate, they’ll move differently. They will move with respect to the world coordinate system’s origin, and not the object’s.
    The solution? A vtkAssembly. In a vtkAssembly, all objects move together. An assembly is kind of an aggregate actor, with properties similar to that of a vtkActor.

    Marc-Alex had worked on a vtkAssembly-vtkFollower combination and that gave me a real good head start. Here’s how his menu looked like.

    A Basic Follower Menu

    Now, the task is to integrate this menu with our existing DIPY framework and modify the previously created elements to work with this menu.

    July 21, 2016 07:04 AM

    Anish Shah
    (Core Python)

    GSoC'16: Week 7 and 8

    GSoC

    Makefile

    Last week, I told you guys about adding Fedora docker image. After that, I added an argument OS in Makefile. Now, the developers can choose between Ubuntu and Fedora while building the docker image. I have created two separate Dockerfile for Ubuntu and Fedora. You can find these changes in the same pull request here.

    Reviews

    I spent other part of the last two week on updating the patches according to reviews suggested by my mentor Maciej Szulik and other PSF members - @berker.peksag and @r.david.murray.

    Add GitHub PR field on issue page

    I renamed the schema to shorter names like - github_pullrequest_url to just pull_request because in future we might want to move to different provider. Likewise on the HTML wide, I renamed few names to smaller ones. You can check out the patch here

    In the first patch, after extracting the pull request id, I used to lookup in the DB to check if the pull request exists in the DB. @r.david.murray suggested not to do a lookup and just do a mechanical translation. The updated patch is here

    Thank you for reading this blogpost. This is it for today. See you again. :)

    July 21, 2016 12:00 AM

    July 20, 2016

    Yen
    (scikit-learn)

    How to set up 32bit scikit-learn on Mac without additional installation

    Sometimes you may want to know how scikit-learn behaves when it’s running on 32-bit Python. This blog post try to give the simplest solution.

    Step by Step

    Below I’ll go through the procedure step by step:

    I. Type the following command and make sure it outputs 2147483647.

    arch -32 /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python -c "import sys; print sys.maxint"
    

    II. Modify line 5 of Makefile exists in root directory of scikit-learn becomes:

    PYTHON ?= arch -32 /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python
    

    and modify line 11 to:

    BITS := $(shell PYTHON -c 'import struct; print(8 * 	struct.calcsize("P"))')
    

    III. Type

    sudo make
    

    in the root directory of scikit-learn and you are good to go!

    Verification

    You can verify if 32-bit version of scikit-learn built successfully by typing:

    arch -32 /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python
    

    to enter 32-bit Python shell.

    After that, type:

    import sklearn
    

    to check if sklearn can now run on 32-bit Python.

    Hope this helps!

    July 20, 2016 11:52 PM

    John Detlefs
    (MDAnalysis)

    SciPy 2016!

    Last week I went to Austin, TX to Scipy2016. I wasn’t sure what to expect. How would people communicate? Would I fit in, what talks would interest me? Fortunately the conference was a huge success. I have came away a far more confident and motivated programmer than when I went in.

    So what were the highlights of my experience at Scipy?

    On a personal level, I got to meet some of my coworkers, the members of the Beckstein Lab. Dr. Oliver Beckstein, David Dotson, and Sean Seyler are brilliant physicists and programmers who I have been working with on MDAnalysis and datreant. It was surreal to meet the people you have been working with over the internet for 3 months and get an idea of how they communicate and what they enjoy outside of work. It was the modern day equivalent of meeting penpals for the first time. I especially appreciated that David Dotson and Sean Seyler, both approximately four years my senior, provided invaluable advice to a recent graduate. If you’re reading this, thanks guys.

    The most valuable moments were the conversations I had in informal settings. There is a huge diversity in career trajectories among those attending Scipy, everyone has career advice and technical knowledge to impart upon a young graduate as long as you are willing to ask. I had excellent conversations with people from Clover Health, Apple data scientists, Andreas Klockner (Keynote Speaker), Brian Van de Ven (Bokeh Dev), Ana Ruvalcaba at Jupyter, the list goes on…

    Fascinating, Troubling, and Unexpected Insights

    • Scipy doubled in size in the last year!
    • So many free shirts (and stickers), don’t even bother coming with more than one shirt, also nobody wears professional attire.
    • Overheard some troubling comments made by men at Scipy, e.g. “Well, all the women are getting the jobs I’m applying for…” (said in a hallway group, this is not appropriate even if it was a joke)
    • The amount of beer involved in social events is kind of nuts; this probably comes with the territory of professional programming.
    • There are a lot of apologists for rude people, someone can be extremely nonverbally dismissive and when you bring it up to other people they will defend him (yes, always him) saying something to the effect of ‘he has been really busy recently’. Oliver Beckstein is a shining example of someone who is very busy and makes a conscious effort to always be thoughtful and kind.
    • Open source does not always imply open contribution, some companies represented at Scipy maintain open source projects while making the barriers to contribution prohibitively high.
    • A lot of people at Scipy apologize for their job (half-seriously) if they aren’t someone super-special like a matplotlib core developer or the inventor of Python. Your jobs are awesome people!
    • It is really hot in Austin.
    • git pull is just git fetch + git merge.
    • A lot of women in computing have joined and left male dominated organizations not because people are necessarily mean, but because they’ve been asked out too much or harassed in a similar fashion. Stay professional folks.
    • Cows turn inedible corn into edible steak.
    • As a young professional you have to work harder and take every moment more seriously than those older than you in order to get ahead.
    • Breakfast tacos are delicious.
    • Being able to get out of your comfort zone is a professional asset.
    • Slow down, take a breath, read things over, don’t make simple mistakes.

    Here are some talks I really enjoyed

    Datashader!

    Dating!

    Loo.py!

    Dask!

    July 20, 2016 12:00 AM

    Scipy 2016!

    Last week I went to Austin, TX to Scipy2016. I wasn’t sure what to expect. How would people communicate? Would I fit in, what talks would interest me? Fortunately the conference was a huge success. I have came away a far more confident and motivated programmer than when I went in.

    So what were the highlights of my experience at Scipy?

    On a personal level, I got to meet some of my coworkers, the members of the Beckstein Lab. Dr. Oliver Beckstein, David Dotson, and Sean Seyler are brilliant physicists and programmers who I have been working with on MDAnalysis and datreant. It was surreal to meet the people you have been working with over the internet for 3 months and get an idea of how they communicate and what they enjoy outside of work. It was the modern day equivalent of meeting penpals for the first time. I especially appreciated that David Dotson and Sean Seyler, both approximately four years my senior, provided invaluable advice to a recent graduate. If you’re reading this, thanks guys.

    The most valuable moments were the conversations I had in informal settings. There is a huge diversity in career trajectories among those attending Scipy, everyone has career advice and technical knowledge to impart upon a young graduate as long as you are willing to ask. I had excellent conversations with people from Clover Health, Apple data scientists, Andreas Klockner (Keynote Speaker), Brian Van de Ven (Bokeh Dev), Ana Ruvalcaba at Jupyter, the list goes on…

    Fascinating, Troubling, and Unexpected Insights

    • Scipy doubled in size in the last year!
    • So many free shirts (and stickers), don’t even bother coming with more than one shirt, also nobody wears professional attire.
    • Overheard some troubling comments made by men at Scipy, e.g. “Well, all the women are getting the jobs I’m applying for…” (said in a hallway group, this is not appropriate even if it was a joke)
    • The amount of beer involved in social events is kind of nuts; this probably comes with the territory of professional programming.
    • There are a lot of apologists for rude people, someone can be extremely nonverbally dismissive and when you bring it up to other people they will defend him (yes, always him) saying something to the effect of ‘he has been really busy recently’. Oliver Beckstein is a shining example of someone who is very busy and makes a conscious effort to always be thoughtful and kind.
    • Open source does not always imply open contribution, some companies represented at Scipy maintain open source projects while making the barriers to contribution prohibitively high.
    • A lot of people at Scipy apologize for their job (half-seriously) if they aren’t someone super-special like a matplotlib core developer or the inventor of Python. Your jobs are awesome people!
    • It is really hot in Austin.
    • git pull is just git fetch + git merge.
    • A lot of women in computing have joined and left male dominated organizations not because people are necessarily mean, but because they’ve been asked out too much or harassed in a similar fashion. Stay professional folks.
    • Cows turn inedible corn into edible steak.
    • As a young professional you have to work harder and take every moment more seriously than those older than you in order to get ahead.
    • Breakfast tacos are delicious.
    • Being able to get out of your comfort zone is a professional asset.
    • Slow down, take a breath, read things over, don’t make simple mistakes.

    Here are some talks I really enjoyed

    Datashader!

    Dating!

    Loo.py!

    Dask!

    July 20, 2016 12:00 AM

    July 19, 2016

    Sheikh Araf
    (coala)

    [GSoC16] Week 8 update

    Time flies and it’s been an astonishingly quick 8 weeks. I’ve finished work for the first coala Eclipse release, and the plug-in will be released with coala 0.8 in the next few days.

    Most of the coala team is at EuroPython so the development speed has slowed down. Nevertheless there are software development sprints this weekend and coala will be participating too.

    We also plan on having a mini-conference of our own, and will have lightning talks from GSoC students and other coala community members.

    As I’m nearing the end of my GSoC project, I’ve started reading up some material in order to get started with implementing the coafile editor. Currently I’m planing to extend either the AbstractTextEditor class or the EditorPart class, or something similar maybe.

    Cheers.

    July 19, 2016 03:30 PM

    Kuldeep Singh
    (kivy)

    After Mid-Term Evaluation

    Hello! guys,

    It’s been a month since I wrote a blog. I passed the Mid-Term and my mentors wrote a nice review for me. After my last blog post I have worked on a couple of features. Have a look at my PRs (Pull Requests) on the project Plyer.

    I am quiet happy with my work and hope to do more in future.

    Visit my previous blog here.

     


    by kiok46blog at July 19, 2016 11:58 AM

    GSoC week 8 roundup

    @cfelton wrote:

    There has been a little bit of a slump after the midterms
    hopefully this will not continue throughout the rest of
    the program :slight_smile:

    The end will approach quickly.

    • August 15th: coding period ends.
    • August 15-20th: students submit final code and evaluations.
    • August 23-25th: mentors submit final evaluations.

    Overall the progress being made is satisfactory. I am looking
    forward to the next stage of the projects now that majority of the
    implementation is complete: analysis of the designs, clean-up,
    and documentation.

    One topic I want to stress, a program like GSoC is very different
    than much of the work that is completed by an undergrad student.
    This effort is the students exposition for this period of time
    (which isn't insignificant - (doh double negative)). Meaning, the
    goal isnt' to simply show us you can get something working but you
    are publishing your work to the public. Users should easily be able
    to use the cores developed and the subblocks within the cores.
    Developers, reviewers, contributors should feel comfortable reading
    the code. The code should feel clean [1]. You (the students) are
    publishing something into the public domain that carries your name
    take great pride in your work: design, code, documenation, etc.

    As well as the readability stated above the code should be
    analyzed for performance, efficiency, resource usage, etc.
    This information should be summarized in the blogs and final
    documentation.

    Student week8 summary (last blog, commits, PR):

    jpegenc:
    health 88%, coverage 97%
    @mkatsimpris: 10-Jul, >5, Y
    @Vikram9866: 25-Jun, >5, Y

    riscv:
    health 96%, coverage 91%
    @meetsha1995: 14-Jul, 1, N

    hdmi:
    health 94%, coverage 90%
    @srivatsan: 02-Jul, 0, N

    gemac:
    health 93%, coverage 92%
    @ravijain056, 04-Jul, 2, N

    pyleros:
    health missing, 70%
    @formulator, 26-Jun, 0, N

    Students and mentors:
    @mkatsimpris, @vikram, @meetshah1995, @Ravi_Jain, @sriramesh4,
    @forumulator,
    @jck, @josyb, @hgomersall, @martin, @guy.eschemann, @eldon.nelson,
    @nikolaos.kavvadias, @tdillon, @cfelton, @andreprado88, @jos.huisken

    Links to the student blogs and repositories:

    Merkourious, @mkatsimpris: gsoc blog, github repo
    Vikram, @Vikram9866: gsoc blog, github repo
    Meet, @meetshah1995, gsoc blog: github repo
    Srivatsan, @srivatsan-ramesh: gsoc blog, github repo
    Ravi @ravijain056: gsoc blog, github repo
    Pranjal, @forumulator: gsoc blog, github repo

    Posts: 2

    Participants: 1

    Read full topic

    by @cfelton Christopher Felton at July 19, 2016 10:18 AM

    Yashu Seth
    (pgmpy)

    The Canonical Factor

    Hey, nice to have you all back. This post would be about the canonical factors used to represent pure Gaussian relations between random variables.

    We already have a generalized continuous factor class and even a joint Gaussian distribution class to handle the Gaussian random variables so why do we need an another class? Well, at an abstract level the introduction of continuous random variables is not difficult. As we have seen in the ContinuousFactor class we can use a range of different methods to represent the probability density functions.

    We can multiply factors, which in this case corresponds to multiplying the multidimensional continuous functions representing the factors; and we can marginalize out variables in a factor, which in this case is done using integration rather than summation. It is not difficult to show that, with these operations in hand, the sum-product inference algorithms that we used in the discrete case can be applied without change, and are guaranteed to lead to correct answers. But a closer look at the ContinuousFactor methods reveal that these implementations are not at all efficient. One can say that we can use these methods directly, only on certain toy examples. Once the number of variables increase, we can not always guarantee that these methods will perform in a feasible manner.

    In order to provide a better solution, we restrict our variables to the Gaussian universe and hence bring the class JointGaussianDistribution into the picture. While this representation is useful for certain sampling algorithms, a closer look reveals that it can also not be used directly in the sum-product algorithms. Why? Because operations like product and reduce involve matrix inversions at each step. For a detailed study of these operations, you can refer Products and Convolutions of Gaussian Probability Density Functions.

    So, in order to compactly describe the intermediate factors in a Gaussian network without the costly matrix inversions at each step, a simple parametric representation is used known as the Canonical Factor. This representation is closed under the basic operations used in inference: factor product, factor division, factor reduction, and marginalization. Thus, we can define a set of simple data structures that allow the inference process to be performed. Moreover, the integration operation required by marginalization is always well defined, and it is guaranteed to produce a finite integral under certain conditions; when it is well defined, it has a simple analytical solution.

    The Canonical Form Representation

    The simplest representation used in this setting represents the intermediate result as a log-quadratic form exp(Q(x)) where Q is some quadratic function. In the inference setting, it is useful to make the components of this representation more explicit. The canonical factor representation is characterized by the three parameters K, h and g. For details on its representation and these parameters, refer Section 14.2.1.1 of the book, Probabilistic Graphical Models Principles and Techniques.

    The CanoncialFactor class

    Similar to the JointGaussainDistribution class, the CanonicalFactor class is also derived from the ContinuousFactor class but with its own implementations of the methods required for the sum-product algorithms that are much more efficient than its parent class methods. Let us have a look at the API of a few methods in this class.

    API of the _operate method that is used to define the product and divide methods.

    >>> import numpy as np
    >>> from pgmpy.factors import CanonicalFactor
    >>> phi1 = CanonicalFactor(['x1', 'x2', 'x3'],
                               np.array([[1, -1, 0], [-1, 4, -2], [0, -2, 4]]),
                               np.array([[1], [4], [-1]]), -2)
    >>> phi2 = CanonicalFactor(['x1', 'x2'], np.array([[3, -2], [-2, 4]]),
                               np.array([[5], [-1]]), 1)
    
    >>> phi3 = phi1 * phi2
    >>> phi3.K
    array([[ 4., -3.,  0.],
           [-3.,  8., -2.],
           [ 0., -2.,  4.]])
    >>> phi3.h
    array([ 6.,  3., -1.])
    >>> phi3.g
    -1
    
    >>> phi4 = phi1 / phi2
    >>> phi4.K
    array([[-2.,  1.,  0.],
           [ 1.,  0., -2.],
           [ 0., -2.,  4.]])
    >>> phi4.h
    array([-4.,  5., -1.])
    >>> phi4.g
    -3
            
    

    This class also has a method, to_joint_gaussian to convert the canoncial representation back into the joint gaussian distribution.

    >>> import numpy as np
    >>> from pgmpy.factors import CanonicalFactor
    >>> phi = CanonicalFactor(['x1', 'x2'], np.array([[3, -2], [-2, 4]]),
                              np.array([[5], [-1]]), 1)
    >>> jgd = phi.to_joint_gaussian()
    >>> jgd.variables
    ['x1', 'x2']
    >>> jgd.covariance
    array([[ 0.5  ,  0.25 ],
           [ 0.25 ,  0.375]])
    >>> jgd.mean
    array([[ 2.25 ],
           [ 0.875]])
    
    

    Other than these methods the class has the usual methods like marginalize, reduce and assignment. Details of the entire class can be found here.

    So with this I come to the end of this post. Thanks, once again for going through it. Hope to see you next time. Bye :-)

    July 19, 2016 12:00 AM

    July 17, 2016

    Yen
    (scikit-learn)

    Using Function Pointer to Maximize Code Reusability in Cython

    When writing C, function pointer is extremely useful because it can help us define a callback function, i.e., a way to parametrize a function. This means that some part of the function behavior is not hard-coded into itself, but into the callback function provided by user. Callers can make function behave differently by passing different callback functions. A classic example is qsort() from the C standard library that takes its sorting criterion as a pointer to a comparison function.

    Besides the benefit above, we can also use function pointer straightforwardly to avoid redundant control flow code such as if, else.

    In this blog post, I’m going to explain how we can combine function pointer and Cython fused types in a easy way to make function pointer become more powerful than ever, and therefore maximize the code reusability in Cython.

    Function Pointer

    Let’s start from why function pointer can help us address code duplication issue.

    Consider we have the following two C functions, one add 1 and the other add 2 to the function argument:

    float add_one(float x) {
    	return x+1;
    }
    
    float add_two(float x) {
    	return x+2;
    }
    

    Now close your eyes, try your best to imagine the operation x+1 performed in add_one and the operation x+2 performed in add_two are costly which must be implemented in C or they will take several hours to complete.

    Okay, base on the imagined reason above, we indeed need to import C funcitons above to speed up our Cython function, which will return (x+1)*2+1 if x is an odd number, or (x+2)*2+2 if x is an even number:

    cdef float linear_transform(float x):
    	"""
    	This function will return (x+1)*2+1 if x is odd
    	                          (x+2)*2+2 if x is even
    	"""
    	
    	float ans
    
    	if x % 2 == 1: # x is odd
    		ans = add_one(x)
    	else:          # x is even
    		ans = add_two(x)
    	
    	ans *= 2
    	
    	# Where code duplication happens!
    	if x % 2 == 1:
    		ans = add_one(x)
    	else:
    		ans = add_two(x)
    	
    	return ans
    

    As one can see, there is a code duplication appears in the end of this function, because we have to check whether we need to apply add_one or add_two to the variable x.

    To address this issue, we can define a function pointer and let it point to the correct function once we know x is a odd number or even number. By doing so, we don’t have to repeat annoying if, else anymore.

    Above code snippet can reduce to:

    ctypedef float (*ADD)(float x)
    
    cdef float linear_transform(float x):
    	"""
    	This function will return (x+1)*2+1 if x is odd
    	                          (x+2)*2+2 if x is even
    	"""
    	ADD add
    	float ans
    
    	if x % 2 == 1: # x is odd
    		add = add_one
    	else:          # x is even
    		add = add_two
    	
    	ans *= 2
    	ans = add(ans)
    	
    	return ans
    

    Now The code snippet is more readable, and for sure, function pointer do make our code looks neat!

    Note: Although there is only one duplication in above example, there may be a lot in real code, which can show function pointer’s value more obviously.

    Function Pointer’s Limitation

    However, function pointer is not omnipotent. Although they provide a good way to write generic code, unfortunately they don’t provide you with type generality. What do I mean?

    Consider if we now have the following two C functions that both add 1 to the argument variable, one is for type float and one is for type double:

    float add_one_float(float x) {
    	return x+1;
    }
    
    double add_one_double(double x) {
    	return x+1;
    }
    

    Now let’s do the imagination process again, pretending that these two extern C functions can speed up the following Cython function linear_transform :

    cdef floating linear_transform(floating x):
    	"""
    	This function will return (x+1)*2+1 of the 
    	same type as input argument
    	"""
    	
    	floating ans
    
    	if floating is float:
    		ans  = add_one_float(x)
    	elif floating is double:
    		ans  = add_one_double(x)
    	
    	ans *= 2
    	
    	if floating is float:
    		ans  = add_one_float(x)
    	elif floating is double:
    		ans  = add_one_double(x)
    	
    	return ans
    

    Don’t be scared if you havn’t seem floating before, to be brief, floating here refers to either type float or type double. It is just a feature called fused types in Cython, which basically serves the same role like templates in C++ or generics in Java.

    Note that now we can’t define a function pointer and let it point to the correct function like what we did in our first example, because C functions add_one_float and add_one_double have different function signatures. Since C is a strong typed language, it’s hard to define a function pointer that can point to functions with different types. (which is why, for example, the standard library qsort still requires a function that takes void* pointer.)

    NOTE: Usage of void* pointer in C is beyond the scope of this blog post, you can find a simple introduction here. But remember, it’s dangerous.

    Function Pointer + Fused Types

    Fortunately, fused types is here to rescue us. With this useful tool, we can actually define fused types function pointer to solve above problem!

    ctypedef floating (*ADD)(floating x)
    
    cdef floating linear_transform(floating x):
    	"""
    	This function will return (x+1)*2+1 with the 
    	same type as input argument
    	"""
    	
    	ADD add_one
    	floating ans
    
    	if floating is float:
    		add_one = add_one_float
    	elif floating is double:
    		add_one = add_one_double
    		
    	ans = add_one(x) # (x+1)
    
    	ans *= 2         # (x+1)*2
    	
    	ans = add_one(x) # (x+1)*2+1
    	
    	return ans
    

    Note that since floating can represent either float or double, function pointer of type floating have the ability to achieve type generality, which is not available before we combine fused types with function pointer.

    Finally, we are going to demystify the secret of this magic trick performed by Cython and make sure that it works properly.

    Demystifying How It Work

    In order to know how Cython fused types function pointer works, let’s become a ninja and dive deep to peep the C code generated by Cython.

    In the generated C code of above Cython function, there is no if floating is float: anymore. Actually, to accommodate fused types floating, Cython generates two C functions, one for float and another for double.

    And in the generated C function for float, it directly assigns the function pointer we declared to the imported C function that will actually be called when x is of type float:

    __pyx_v_add_one = add_one_float;
    

    Same as the float case, generated C function for double also includes:

    __pyx_v_add_one = add_one_double;
    

    which directly assigns function pointer to the correct imported function.

    In fact, this allows for an optimization by the C compiler since it can identify variables that remain unchanged within a function. It would find out that function pointer __pyx_v_add_one is only set once to a constant, i.e., an imported C function. Hence after object code is linked, __pyx_v_add_one will directly be assigned to the C function.

    On the contrary, Python interpreter can provides only little in static analysis and code optimization since the language design doesn’t have the compile phase.

    In sum, always implement your computation heavy code in Cython instead of Python.

    Summary

    Combining function pointer with fused types raises its power to another higher level. Actually, it is a generalized version of original function pointers, and can be used in lots of places to make our code looks more readable and cleaner. Also, it is often a good idea to check the C code generated by Cython so as to make sure it’s doing what you hoped.

    See you next time!

    July 17, 2016 11:52 PM

    July 16, 2016

    Upendra Kumar
    (Core Python)

    New WireFrame Diagrams for tkinter GUI

    Hey, I have designed new wireframe designs for my application which I will use in my documentation. I am posting them here :

    1. Welcome Page

    welcomepage

    2. InstallFromPyPI

    installfrompypi

    3. InstallFromLocalArchive

    installfromlocalarchive

    4. Update Page

    updatepage

     


    by scorpiocoder at July 16, 2016 05:47 PM

    Riddhish Bhalodia
    (dipy)

    Registration experiments

    What…

    Basically data registration (in our case 3D MRI scans) is essentially process to bring the tow different datas (different in structure, distortion, resolution…) in one coordinate frame so that any kind of further processing involving both the datas becomes much simpler.

    Why..

    As to why is registration important for us, there is a simple answer. We are aiming at template assisted brain extraction so the first step is to align (register) the input data to the template data. That will allow for the further processing of the images.

    How..

    There are different methods for registration and some of them are available on DIPY, most important are the affine registration and the diffeomorphic registration (non-linear registration) , so we will be using these routines for several experiments, this will work in combination with simple brain extraction by median otsu.

    Datasets..

    Experiments

    I ran several experiments, for testing different registration combinations and which would give best results.

    [A] Affine registration with raw input, raw template 

    levels = [10000,1000,100]

    reg_exp1.pngAffine Registration, with raw input and raw template data.
    brain_segmentation-4.pngAnother representation (the middle slice) of the affine registration

     

    We can see that we need a slight correction, so we follow this process up by the next experiment using non-linear registration on the images to correct alignment even more.

    [B] Non-Linear Registration with affine transformed input and identity pre_align

    With default parameters and levels = [10,10,5], and raw data and affine transformed template (along with the skull) given as inputs to the diffeomorphic registration.

    reg_exp1.pngNon-linear registration using the transformed template and the pre_align set as identity

    The above figure does not show much difference from the affine result, so lets view the other way.

    brain_segmentation-5.pngAnother representation (the middle slice)

    The above figure shows some changes from the affine result and looks a little better, we can see that the skull of the template is causing some problems, so we will also see the result which just uses the extracted brain from the template

    [C] Non-Linear Registration with pre_align = affine transformation

    here too the levels = [10,10,5] but the inputs are the raw input data and raw template data

    reg_exp3.pngNon-linear registration with pre_align = affine transformation

    Again we cant see much difference, so lets see the other image

    brain_segmentation-6.pngAnother representation (the middle slice)

    Again we see that the skull is causing a problem to the diffeomorphic correction, so the next experiment.

    [D] Using only brain from the template we repeat the part C

    reg_exp3.pngOnly brain template, aligned with part C method

    It seems a great fit from the above figure, just a little largely sized, now seeing the alternate representation

    brain_segmentation-7.pngAnother representation (the middle slice)

    The above registration seems nice, yes there are few corrections so I started tuning the parameters of the non-linear registration.

    Another thing to be noticed is that the boundary of the transformed template output is little skewed from the brain in the input data, so we have a patch based method which should correct this! The results to that using this will be posted in next blog.

    Next Up..

    I will have an immediate next post about how the following registration helped me to get to a good brain extraction algorithm, and describe the algorithm as well.

    Then I will have to test it on several datasets to see for it’s correctness, and compare with median otsu.

     

     


    by riddhishbgsoc2016 at July 16, 2016 07:18 AM

    July 15, 2016

    sahmed95
    (dipy)

    Model fitting in Python - A basic tutorial on least square fitting, unit testing and plotting Magnetic Resonance images


    blog &v

    Fitting a model using scipy.optimize.leastsq

    I am halfway through my Google Summer of Code project with Dipy under the Python Software Foundation and I have published a few short posts about the project before but in this post I am going to walk through the entire project from the start. This can also be a good tutorial if you want to learn about curve fitting in python, unit testing and getting started with Dipy, a diffusion imaging library in Python.
    The first part of this post will cover general curve fitting with leastsq, plotting and writing unit tests. The second part will focus on Dipy and the specific model I am working on - The Intravoxel Incoherent motion model.

    Development branch for the IVIM model in Github : (https://github.com/nipy/dipy/pull/1058)

    Curve fitting with leastsq

    Mathematical models proposed to fit observed data can be characterized by several parameters. The simplest model is a straight line characterized by the slope (m) and intercept (C).

    y = mx + C

    Given a set of data points, we can determine the parameters m and C by using least squares regression which minimizes the error between the actual data points and those predicted by the model function. To demonstrate how to use leastsq for fitting let us generate some data points by importing the library numpy, defining a function and making a scatter plot.
    In [140]:
    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.optimize import leastsq
    # This is a magic function that renders the plots in the Ipython notebook itself
    % matplotlib inline
    Let us define a model function which is an exponential decay curve and depends on two parameters.

    y = A exp(-x * B)

    Documentation is important and we should write what each function does and specify the input and output parameters for each function.
    In [141]: