Showing posts from 2011

split function behavior differences

A little note from my debugging experience. Split function works differently and I would say unexpectedly for empty string in different programming languages, and it can cause difficult to find bugs (especially if you use a lot of languages simultaneously).

I've created a table with the popular programming languages:

LanguageSplit without parametersSplit with parameterPython''.split()=[]''.split(',')=['']Ruby''.split()=[]''.split(',')=[]JavaScript''.split()=['']''.split(',')=['']PHPN/Aexplode(',', '')=array(0=>'')JavaN/A"".split(",")={""}C#"".Split()={""}"".Split(',')={""}
As you can see sometimes it returns empty array, but sometimes an array with the one empty element. So please be careful with the split operation :)

Python templating comparison by memory consumption

Another comparison between:

standard formatting;more advanced standard string.template;MakoGenshiJinja2 Here the code I used for measuring: #!/usr/bin/env python import sys NAME = 'name' def render1(): template = "<p>Hello %s!</p>" return template % NAME def render2(): from string import Template template = Template("<p>Hello ${name}!</p>") return template.substitute(dict(name=NAME)) def render3(): from mako.template import Template template = Template("<p>Hello ${name}!</p>") return template.render(name=NAME) def render4(): from genshi.template import MarkupTemplate tmpl = MarkupTemplate('<p>Hello $name!</p>') stream = tmpl.generate(name=NAME) return stream.render('xhtml') def render5(): from jinja2 import Template template = Template('<p>Hello {{ name }}!</p>')…

Web application framework comparison by memory consumption

Memory consumption is slightly specific to my area of software development now, but I did some research recently and maybe these results can be useful for others. Of course, I know that precious comparison is very difficult to carry out, but actually I needed only overall picture. And let me admit that results are pretty interesting and even frustrated (at least for me).

As a basis I took so-starving project, and measured initial RSS (Resident Set Size) of the each process (local development webservers). Platform: x86_64 Linux (latest Ubuntu with all updates).

As a reference, here is the RSS of the interpreters in interactive console mode:

InterpreterVersionRSS (kB)stackless python2.6.43916ruby (via irb)1.8.74664python2.7.15624php5.3.56924v8 (via node.js shell)
One more reference - the most simplest WSGI app (example in the Python documentation). It's RSS: 7336Kb, so I assume it's almost impossible to consume less memory without additional tweaks and optimizatio…

Packing executables

One of the biggest challenges with embedding platforms is the limitation related to file sizes. Here is the hint how to make executables smaller - strip them and pack them:

strip is a tool from GNU binutils, it discards symbols. Usually the platform toolchain has one.upx is an excellent executable packer. Can be downloaded as binary or sources from the UPX site.
As an example, I'll show you the packing of Python 2.6.7 binary:

$ ls -s --block-size=KB python
6493kB python
$ strip -s python
$ ls -s --block-size=KB python
1696kB python
$ upx --best python
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2010
UPX 3.05        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2010

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
   1692400 ->    608896   35.98%  linux/ElfAMD   python

Packed 1 file.
$ ls -s --block-size=KB python
611kB python

So, we've got 10:1 …

Compiling Python: Modules/Setup

A little hint for Python developers who use it for embedded or unconventional platforms (like Cray supercomputers if you're lucky): it can be compiled and used without any dynamic libraries. I've got the problem with stripped - some Python shared object (like try to use it, but can't find anything because it's stripped. The only choice I had is using Python without these shared objects.

Fortunately, Python support it out of the box. After configuring it, you can use Modules/Setup file to set up which modules have to be compiled within the Python binary:
The build process works like this:
 1. Build all modules that are declared as static in Modules/Setup,
    combine them into libpythonxy.a, combine that into python.
 2. Build all modules that are listed as shared in Modules/Setup.
 3. Invoke That builds all modules that
    a) are not builtin, and
    b) are not listed in Modules/Setup, and
    c) can be build on the target For example, if you wa…

Embedding Python

Just want to share some useful links about embedding Python to your C-based application:
the main article: Embedding Python in Another Applicationadditional article that shows peculiarities of multithreading, sockets and shared memory: Embedding Python in C/C++ (Part1, Part2)Cython (or Pyrex) can be used to reduce handwritten code for Python interoperability: A quick Cython introduction If you have doubts about Python size, there are some minimal implementations: see Embedded Python article. Let me quote tinypy:
tinypy is a minimalist implementation of python in 64k of code
What more could you possibly want??
a pony? However, I highly recommend to use classic CPython implementation (basically because it has a huge number of contributors and supporters, and has an excellent documentation). It can be stripped up to 1-2 megabytes depending of your requirements.

Introduction to ReviewBoard

Review Board is a powerful web-based code review tool that offers developers an easy way to handle code reviews. It scales well from small projects to large companies and offers a variety of tools to take much of the stress and time out of the code review process. Review Board is written in the Python programming language and makes use of the Django web framework.


Install auxiliary packages if needed and all its dependencies:

$ sudo apt-get install python-setuptools
$ sudo apt-get install python-svn
$ sudo apt-get install python-subversion
$ sudo apt-get install apache2
$ sudo apt-get install libapache2-mod-python
$ sudo apt-get install git

Clone the ReviewBoard package and install it:

$ git clone git://
$ cd reviewboard
$ sudo python develop

Also install post-review tool:

$ sudo easy_install -U RBTools

Set up the required site for the ReviewBoard (for Apache/SQLite backend, otherwise - see Creating Sites reference):

$ sudo rb-site i…

Your Language Sucks (and about PHP again)

I've found a good wiki-article about programming languages faults and want to share a link: The "winner" is PHP as usual, but Python is also noticed (as well as Ruby :) ). I agree with almost everything there, but in my own eyes Python is still the best choice for programming.
Nevertheless, I also want to point out some other recent links related to PHP (I really shouldn't but just can't help doing it):

PHP SucksPHP Must DieWhat are the horrors of PHP?What factors during the development of PHP contributed to it being such a poorly designed language? And a quote from the interview with Rasmus Lerdorf  (the creator of PHP): I don't know how to stop it, there was never any intent to write a programming language [...] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way.

Facebook and PHP

There is a common mistake about "If everybody use it, so I also have to use it - millions of people can't be wrong". Apparently, they can, and huge codebase, support and knowledge mean nothing, otherwise we would still use Fortran, Cobol, Basic and other almost died monsters.

Also there is an another common mistake about "If big corporation use it, I also have to use it". It's very doubtful, almost always decision are made in hurry and/or by wrong people and/or without serious consideration. And after some period of time, it's difficult to nullify previous decision because it would require huge efforts. Good example - Facebook.

Let me quote the presentation HipHop for PHP Tech Tasting:
PHP is problematic for Facebook:
High CPU usageHigh memory usageReuse of PHP logic in other systemsExtensions are hard to write for most PHP developers But huge codebase, strange affection towards PHP (in what universe "loose typing and universal array" are good t…

Python vs JS vs PHP for embedded systems

I've got a question about which programming language is preferable for developing websites in the embedded system (hence with some limited resources). Here is my small investigation in a table form.PythonJS (node.js)PHPRuntime Size (Gentoo Linux x86)~1.4Mb (/usr/lib/ (/usr/lib/ (/usr/bin/php)Type SystemStrong TypingWeak TypingWeak TypingVulnerabilities867324501Performance35.425.3562.45OptimizationPsyco (up to 100x)V8 JavaScript engine is already optimizedPHP AcceleratorsDocumentationExcellentAveragePoorC BindingsExcellentAveragePoorCode ReadabilityWith PEP8 it can be perfectEven with Google JavaScript Style Guide it can be a messPEAR Coding Standards can't help itDebuggingExcellentAveragePoorMy totals: Python - FTW, JS - AVG, PHP - KMN :-)Additional notes (I didn't include them in the table due to its irrelevance in some cases):Python is supported on enormous amount of platform, PHP - on slightly less, V8 - only on Linux/Windows/MacOSX.So…

DIY: Business cards in LaTeX

Business card can be handy in many cases, and it's not a big deal to create it at home. Let me show one of the methods.


We need to have:
matte presentation paper (weight 44 lb/165 g/m2 in my case);razor paper trimmer (I've used X-ACTO 12" Personal Paper Trimmer, but can't recommend it - it has a habit to stuck in the middle of the trimming process);printer (don't know about laser printers, but ink one works fine for me);LaTeX software. The last point can be challenging, because LaTeX is not so smooth and user friendly as it can be. My basic recommendations:

install Perl (it's required by auto-pst-pdf package);update/install all the required LaTeX packages (e.g., some Linux distributions provide incredible old LaTeX packages);use "-shell-escape" command line option for pdflatex command;if nothing helps, don't use "auto-pst-pdf", but build DVI/PS file, and convert it to PDF.
Single business card
The code (based on LaTeX QR Bas…

Mini HOWTO: Getting file names in Zip-archives using Bash

I'm gathering stats about my archives, and one of these is getting all the file names in them. There are some challenges about it, so let me show the required commands.

Getting file names from the one archive:

unzip -l /path/to/zip-file | tail -n +4 | head -n -2 | cut -c31-

Executing pipelined commands in xargs:

xargs -I {} -i sh -c 'command1 | command2 | ... | commandN'

For my case I've used the expression:

find . -iname "*.zip" -print0 | xargs -0 -n1 -I {} -i sh -c 'unzip -l {} | tail -n +4 | head -n -2 | cut -c31-' | sort | uniq -c
Yeah, yeah, black magic, gotcha Good luck!

CouchDB introduction

A phenomenon of document-oriented databases is quite interesting - many software developers face problems there this kind of databases is an excellent choice, but these developers don't use them and reinvent the wheel using relational or object-oriented databases. Difficult to say why it happens - because of ignorance, fear of performance problems or desire to reinvent own wheel, but this situation widely spread over the world. Fortunately, the common sense is prevailing, and NoSQL movement prove it.

I have a serious experience with IBM Lotus Notes/Domino, and one of the most interesting features for me on first stages of its studying was saving application design in the documents. Thus the deployment of Lotus Notes database is incredibly easy - one just have to copy NSF-file to another location and it's ready for use (not always, but for trivial cases it's enough). Sometimes it can be a really useful feature, especially during prototyping, but not so many document-oriente…