Posts

Showing posts from 2011

split function behavior differences

A little note from my debugging experience. Split function works differently and I would say unexpectedly for empty string in different programming languages, and it can cause difficult to find bugs (especially if you use a lot of languages simultaneously). I've created a table with the popular programming languages: Language Split without parameters Split with parameter Python ''.split()=[] ''.split(',')=[''] Ruby ''.split()=[] ''.split(',')=[] JavaScript ''.split()=[''] ''.split(',')=[''] PHP N/A explode(',', '')=array(0=>'') Java N/A "".split(",")={""} C# "".Split()={""} "".Split(',')={""} As you can see sometimes it returns empty array, but sometimes an array with the one empty element. So please be careful with the split operation :)

Python templating comparison by memory consumption

Another comparison between: standard formatting; more advanced standard string.template ; Mako Genshi Jinja2 Here the code I used for measuring: #!/usr/bin/env python import sys NAME = 'name' def render1(): template = "<p>Hello %s!</p>" return template % NAME def render2(): from string import Template template = Template("<p>Hello ${name}!</p>") return template.substitute(dict(name=NAME)) def render3(): from mako.template import Template template = Template("<p>Hello ${name}!</p>") return template.render(name=NAME) def render4(): from genshi.template import MarkupTemplate tmpl = MarkupTemplate('<p>Hello $name!</p>') stream = tmpl.generate(name=NAME) return stream.render('xhtml') def render5(): from jinja2 import Template template = Template('<p>Hello

Web application framework comparison by memory consumption

Memory consumption is slightly specific to my area of software development now, but I did some research recently and maybe these results can be useful for others. Of course, I know that precious comparison is very difficult to carry out, but actually I needed only overall picture. And let me admit that results are pretty interesting and even frustrated (at least for me). As a basis I took so-starving project, and measured initial RSS (Resident Set Size) of the each process (local development webservers). Platform: x86_64 Linux (latest Ubuntu with all updates). As a reference, here is the RSS of the interpreters in interactive console mode: Interpreter Version RSS (kB) stackless python 2.6.4 3916 ruby (via irb) 1.8.7 4664 python 2.7.1 5624 php 5.3.5 6924 v8 (via node.js shell) 2.5.9.9 8796 One more reference - the most simplest WSGI app ( example  in the Python documentation). It's RSS: 7336 Kb , so I assume it's almost impossible to consume l

Packing executables

One of the biggest challenges with embedding platforms is the limitation related to file sizes. Here is the hint how to make executables smaller - strip them and pack them: strip is a tool from GNU binutils , it discards symbols. Usually the platform toolchain has one. upx is an excellent executable packer. Can be downloaded as binary or sources from the UPX sf.net site . As an example, I'll show you the packing of Python 2.6.7 binary: $ ls -s --block-size=KB python 6493kB python $ strip -s python $ ls -s --block-size=KB python 1696kB python $ upx --best python                        Ultimate Packer for eXecutables                           Copyright (C) 1996 - 2010 UPX 3.05        Markus Oberhumer, Laszlo Molnar & John Reiser   Apr 27th 2010         File size         Ratio      Format      Name    --------------------   ------   -----------   -----------    1692400 ->    608896   35.98%  linux/ElfAMD   python Packed 1 file. $ ls -s --block-size=

Compiling Python: Modules/Setup

A little hint for Python developers who use it for embedded or unconventional platforms (like Cray supercomputers if you're lucky): it can be compiled and used without any dynamic libraries. I've got the problem with stripped libc.so - some Python shared object (like _socket.so ) try to use it, but can't find anything because it's stripped. The only choice I had is using Python without these shared objects. Fortunately, Python support it out of the box. After configuring it, you can use Modules/Setup file to set up which modules have to be compiled within the Python binary: The build process works like this:  1. Build all modules that are declared as static in Modules/Setup,     combine them into libpythonxy.a, combine that into python.  2. Build all modules that are listed as shared in Modules/Setup.  3. Invoke setup.py. That builds all modules that     a) are not builtin, and     b) are not listed in Modules/Setup, and     c) can be build on the target For exam

Embedding Python

Just want to share some useful links about embedding Python to your C-based application: the main article:  Embedding Python in Another Application additional article that shows peculiarities of multithreading, sockets and shared memory: Embedding Python in C/C++ ( Part1 , Part2 ) Cython (or Pyrex ) can be used to reduce handwritten code for Python interoperability: A quick Cython introduction If you have doubts about Python size, there are some minimal implementations: see Embedded Python article. Let me quote tinypy : tinypy is a minimalist implementation of python in 64k of code ... What more could you possibly want?? a pony? However, I highly recommend to use classic CPython implementation (basically because it has a huge number of contributors and supporters, and has an excellent documentation). It can be stripped up to 1-2 megabytes depending of your requirements.

Introduction to ReviewBoard

Image
Review Board is a powerful web-based code review tool that offers developers an easy way to handle code reviews. It scales well from small projects to large companies and offers a variety of tools to take much of the stress and time out of the code review process. Review Board is written in the Python programming language and makes use of the Django web framework. Installation   Install auxiliary packages if needed and all its dependencies: $ sudo apt-get install python-setuptools $ sudo apt-get install python-svn $ sudo apt-get install python-subversion $ sudo apt-get install apache2 $ sudo apt-get install libapache2-mod-python $ sudo apt-get install git Clone the ReviewBoard package and install it: $ git clone git://github.com/reviewboard/reviewboard.git $ cd reviewboard $ sudo python setup.py develop Also install post-review tool: $ sudo easy_install -U RBTools Set up the required site for the ReviewBoard (for Apache/SQLite backend, otherwise - see Creati

Your Language Sucks (and about PHP again)

I've found a good wiki-article about programming languages faults and want to share a link:  http://wiki.theory.org/YourLanguageSucks . The "winner" is PHP as usual, but Python is also noticed (as well as Ruby :) ). I agree with almost everything there, but in my own eyes Python is still the best choice for programming. Nevertheless, I also want to point out some other recent links related to PHP (I really shouldn't but just can't help doing it): PHP Sucks PHP Must Die What are the horrors of PHP? What factors during the development of PHP contributed to it being such a poorly designed language? And a quote from the interview with Rasmus Lerdorf  (the creator of PHP): I don't know how to stop it, there was never any intent to write a programming language [...] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way.

Facebook and PHP

There is a common mistake about "If everybody use it, so I also have to use it - millions of people can't be wrong". Apparently, they can, and huge codebase, support and knowledge mean nothing, otherwise we would still use Fortran, Cobol, Basic and other almost died monsters. Also there is an another common mistake about "If big corporation use it, I also have to use it". It's very doubtful, almost always decision are made in hurry and/or by wrong people and/or without serious consideration. And after some period of time, it's difficult to nullify previous decision because it would require huge efforts. Good example - Facebook. Let me quote the presentation  HipHop for PHP Tech Tasting : PHP is problematic for Facebook: High CPU usage High memory usage Reuse of PHP logic in other systems Extensions are hard to write for most PHP developers But huge codebase, strange affection towards PHP (in what universe "loose typing and universal ar

Python vs JS vs PHP for embedded systems

UPDATE (07/18/17): The original article was written in 2011 and pretty much outdated, I've updated the numbers and conclusions. I've got a question about which programming language is preferable for the website development for embedded systems (with limited resources). Here is my small investigation in a table form. Please note that the question was about only these 3 programming languages - there are better candidates for the embedded systems now (for example, Rust). The Memory and Performance overhead numbers are based on the n-body benchmark and calculated as relatives to "C gcc #4" measurements. Python 3 Node.js PHP Memory Overhead x7.62 x29.04 x8.53 Type System Strong Typing Weak Typing Weak Typing Vulnerabilities 220 1411 5626 Performance Overhead x78.31 x2.82 x30.12 Documentation Excellent Excellent Average C Bindings Excellent Average Poor Code Readability With PEP8 it can be almost perfect ESLint enforces very good style PEAR Coding Stan

DIY: Business cards in LaTeX

Image
Business card can be handy in many cases, and it's not a big deal to create it at home. Let me show one of the methods. Prerequisites We need to have: matte presentation paper (weight 44 lb/165 g/m 2 in my case); razor paper trimmer (I've used X-ACTO 12" Personal Paper Trimmer , but can't recommend it - it has a habit to stuck in the middle of the trimming process); printer (don't know about laser printers, but ink one works fine for me); LaTeX software. The last point can be challenging, because LaTeX is not so smooth and user friendly as it can be. My basic recommendations: install Perl (it's required by auto-pst-pdf package); update/install all the required LaTeX packages (e.g., some Linux distributions provide incredible old LaTeX packages); use "-shell-escape" command line option for pdflatex command; if nothing helps, don't use "auto-pst-pdf", but build DVI/PS file, and convert it to PDF. Single business ca

Mini HOWTO: Getting file names in Zip-archives using Bash

Image
I'm gathering stats about my archives, and one of these is getting all the file names in them. There are some challenges about it, so let me show the required commands. Getting file names from the one archive: unzip -l /path/to/zip-file | tail -n +4 | head -n -2 | cut -c31- Executing pipelined commands in xargs: xargs -I {} -i sh -c 'command1 | command2 | ... | commandN' For my case I've used the expression: find . -iname "*.zip" -print0 | xargs -0 -n1 -I {} -i sh -c 'unzip -l {} | tail -n +4 | head -n -2 | cut -c31-' | sort | uniq -c Yeah, yeah, black magic, gotcha Good luck!

CouchDB introduction

Image
A phenomenon of document-oriented databases is quite interesting - many software developers face problems there this kind of databases is an excellent choice, but these developers don't use them and reinvent the wheel using relational or object-oriented databases. Difficult to say why it happens - because of ignorance, fear of performance problems or desire to reinvent own wheel, but this situation widely spread over the world. Fortunately, the common sense is prevailing, and NoSQL movement prove it. I have a serious experience with IBM Lotus Notes/Domino , and one of the most interesting features for me on first stages of its studying was saving application design in the documents. Thus the deployment of Lotus Notes database is incredibly easy - one just have to copy NSF-file to another location and it's ready for use (not always, but for trivial cases it's enough). Sometimes it can be a really useful feature, especially during prototyping, but not so many document-orie