Saturday, June 20, 2015

GSoC '15 Progress: Second Report

Past couple weeks had been fun! Learnt many new and interesting things about Python.

The modified source code of METIS has got in, followed by its Cython wrappers. Thanks again to Yingchong Situ for all his legacy work. Nevertheless, things were not smooth and there were lots of hiccups and things to learn.

One of the modules in the package was named types which was being imported by absolute import. Unknown of the fact that types is also a built-in module of python, the situation was a mystery for me. Thanks to iPython which told me this
In [2]: types                                                           
Out[2]: <module 'types' from '/usr/lib/python2.7/types.pyc'>            


This alerted me for all the difference, pros and cons of absolute and relative import. Now one may ask (Does anyone read these blog posts?) why didn't I go with the following at the very first place.


In [3]: from . import types


Actually networkx-metis is supposed to be installed as a namespace package in networkx, and the presence of __init__.py is prohibited in a namespace package. Hence from . import types would raise a Relative import from a non-package error.

We are now following the Google's style guide for Python[1].

Being licensed under Apache License Version 2, we also had to issue a file named NOTICE clearly stating the modifications we did to the library networkx-metis is a derivative work of.

Next important items in my TODO list are
  • Finalizing everything according for namespace packaging
  • Setting up Travis CI
  • Hosting docs over readthedocs.org
That's all for now.

Happy Coding!

[1] https://google-styleguide.googlecode.com/svn/trunk/pyguide.html

Tuesday, June 2, 2015

GSoC '15 Progress : First report


We are past the community bonding period and are in the second week of the coding period. My experience has been very good till now!

I started working on building python wrappers around METIS library which is written in C for graph partitioning. It is extremely fast and the feature was not present in NetworkX because of the problem being categorized as NP-hard so, it needs a lot of approximations to make and Python isn't so good at speed either. A big part of this work was already done by my mentor Yingchong Situ last year. However there were some hiccups :

METIS is licensed under Apache License Version 2.0, nevertheless it uses some of the code licensed under GNU Lesser General Public License. NetworkX has a BSD license. And the work is supposed to be hosted under NetworkX umbrella and shipped as an add-on named networkx-metis. So, a big problem was, what should be the appropriate license for this add-on. We had pretty interesting discussions over it with Aric, Dan and other NetworkX developers and finally decided to remove the LGPL dependencies out from the source code and go with Apache. This took up some changes. I had to replace a couple C files which were making use of qsort with a C++ file using std::sort. In this process I came to learn about extern "C" which is a method to export correct ABI to the library using it. So extern "C" makes a function-name in C++ have 'C' linkage so that client C can make use of the function using a C compatible header file that contains just the declaration of the function.
I also came to learn about how good are C macros while writing functions with undetermined data type or such.

I got my first PR merged! Next is with the wrappers. After it and the setup requirements, I think the add-on will be ready to go.

That's all for now.

Happy coding!

Monday, June 1, 2015

Just a blog : A quick guide to git and GitHub (Part 2)

So, now we are going to take a step further. In this post, we'll deal with one of the most important concepts of distributed collaboration, branching. Also, we will learn about contributing to real world open source libraries. Several concepts are involved in it. I'll try to deal with the most relevant ones.

Also this time, I'll try to explain things, in better way.

Branching : There's a mantra, branch early and branch often. Branches are used to develop new features, simultaneously but isolated. Suppose you are at one point of time in your commit history. A new idea came to your mind about your project and you want to implement it. Let's name the new feature feature_x. Now why do we need branching? Why not implement it right through? What's up with the master? What is that?

master is the one branch your project works on when you initialize a git repository. It's by default, the mainstream. Now suppose you start adding your new feature right in the master, and at one point of time your realize you want to implement a new feature feauture_y. Obviously you'll have to stop one of your work to proceed on another. And this new feature might have bugs or you need to add tests. You might also want it to be reviewed. You'll have to wait for the review. At some point you might come to know that this new feature is totally screwed and you want to go back, where you had started. But you'll realize that you've also made some commits on the other feature you were working simultaneously. At any point of time, you could give up on your project! And that is unfortunate because you forgot the mantra. Branch! Branch! and Branch!

Okay, back in time. We have a new feature, feature_x. You do this
 $ git branch feature_x  
 $ git checkout feature_x  
The first command will create a new branch feature_x and the second command will take you away from master. This means that your next commit will not affect your master. Cheers! You've got freedom. That's what git is for.
The above two lines of code can also be combined into one piece.
 $ git checkout -b feature_x  
This creates the branch and moves you to it simultaneously.

You can check which branch you are currently on by
 $ git branch  
 * feature_x  
   master  
* denotes that you are on feature_x. But still, this branch won't be visible on github, unless until you push it by
 $ git push origin feature_x  
Okay now, it's add, commit and push time! The commit you made has taken your feature_x branch one step ahead than master. If you think your new feature is ready, It's time to merge. Checkout to your master and merge your new branch into it.
 $ git checkout master  
 $ git merge feature_x  
You can check the log now (git log) which has been updated with your new feature.

After your work is done, you can delete the branch locally by
 $ git branch -d feature_x  
and over the github repository by
 $ git push origin :feature_x  

Fork : Now we are going to make a contribution to an open source library. I choose networkx. It's a python package for the creation, manipulation and study of the structure, dynamics and functions of complex networks.
Go to https://github.com/networkx/networkx. On the right top, you'll see three buttons, Watch, Star and Fork.
Watch means you'll get notification for all the activities on the repository.
Star is just a feature of Github to Star a project like you upvote an answer on Quora.
Fork lets you own a copy of the code under your username.

So, after you fork the repo, you can see the exact same package over https://github.com/<username>/networkx. Pretty cool huh!

Clone the forked repository over your computer. Make a new feature branch. Add commit. Push the branch over github. DO NOT merge the branch in your master. Because your master should not be controlled by you but should be in sync with the official source code repository.

After you push the changes with git push origin feature_x, over your https://github.com/<username>/networkx you will find an option to 'Compare & Pull Request'













This option lets you compare the changes you are going to make over the original networkx repository. This will create a Pull Request.
























A pull request is a method of submitting your contributions. The PR is reviewed, analyzed, discussed and then can be either merged or closed. So, go on. Choose a project and get your first PR merged. And once again, do not forget to make a branch.

Dealing with upstream: Your local repository often gets outdated because of the active changes in the official repository. There's a method to update it whenever you want. And you must update your master, before you make a new branch!
To add networkx/networkx as an upstream, do
 $ git remote add upstream https://github.com/networkx/networkx  

So now, whenever you have to update your local master branch, do this
 $ git fetch upstream  
 $ git merge upstream/master  
To update your github repo, do
 $ git push origin  

Welcome to the open source world!

Happy coding!