Haxin Mainframes

A blog about stuff I do, find interesting, or want to blab about..

Random Idea About Language Implementation and Strings

I was reading this awesome post about why this guy appreciates erlang.

And this paragraph caught my eye.

Or take string concatenation. If you pop open the implementation of string concatenation in Perl, Ruby, or JavaScript, you’ll are certain to find an if statement, a realloc, and a memcpy. That is, when you concatenate two strings, the first string is grown to make room for the second, and then the second is copied into the first. This approach has worked for decades and is the “obvious” thing to do. Erlang’s approach is non-obvious, and, I believe, correct. Erlang does not use a contiguous chunk of memory to represent a sequence of bytes. Instead, it represents a sequence of bytes as nested lists of non-contiguous chunks of memory. The result is that concatenating two strings takes O(1) time in Erlang, compared O(N) time in other languages. This is why template rendering in Ruby, Python, etc. is slow, but very fast in Erlang.

I just thought it would be a cool little mini project to try and re implement Python and or Ruby strings to act like Erlang strings (the linked list of bytes versus contiguous). I am kind of using this as a note to myself since writing things down in other places just gets lost a lot of the time..

It would be cool to do this though and then compare the new ruby build with the previous in a bunch of different speed string tests.

After talking to some friends at work I realized that this already done. Using the rope data structure. A cool implementation, librope is a very interesting read.

After checking the librope implementation out I realized that using these strings would only be a benfit in very specific situations, long modifiable strings.

However, having an implementation accessbile in a standard library would deff be a cool thing to have. I could see this being super helpful when doing something like building a website response from templates.

Actually from a quick google it sounds like this is how PyPy implemented strings.