Thursday, February 10, 2011

Data Types

To continue the new language design, I'd like to talk about data types. First, I'd like to talk about the "core" data type. I think every language has it in some way as I believe they all need a way to pass arbitrary data to functions. In the case of SPL, the core data type is a bit string. Every type of data, even complex data structures, is represented as a series of bits in memory. By being able to treat all data as a series of bits in memory, we automatically have generics.

You see, by being able to treat all data as a bit string, whenever we want to pass arbitrary data around, we can do so easily since we don't need any type information. This idea came from Erlang where bit strings are part of the language. In Erlang, you can easily parse binary data into different data types, such as integers. Making this part of the core language not only makes generics easy, but working with binary data in general becomes easier.

As for the other data types, one of the goals of the language is to make things easier, which in turns should result in better software and the focus on simplicity is reflected in the data types. To this end, the number of data types should be relatively small. Emphasis on relatively. The types I currently see are these:
  • bit string
  • integer - Arbitrary-sized integer.
  • decimal - An arbitrary-sized/arbitrary precision numeric type. This is not a float, but an actual full-precision real number.
  • char - A UTF-8 character.
  • string - A series of chars.
  • boolean
  • array - A fixed-size collection of elements that are all the same type.
  • tuple/struct
The last I'm not sure about as each has it's benefits. Both are similar in that they are a set of different pieces of data grouped together, however they are different in how they do this. Tuples can have elements added/removed dynamically and do not have any type information associated with the elements, thus allowing elements of different types to be dynamically grouped together. Structs on the other hand are predefined, each element has type information associated with it, and cannot have elements added/removed dynamically. Both have advantages. I'm just not sure which is better for this language. The dynamic nature of a tuple is nice, but the ability to have a well defined, self-describing structure is also nice and fits with the principles of the language to make things safer.

Regardless, it is expected that complex data types will be created and used. For example, lists are one that would not be part of the type system, but is expected to be created as they are useful. The reason that lists are not part of the core language is because different implementations have different benefits and tying the language to one implementation didn't seem right. This is especially true since different algorithms/data structures are developed over time as systems evolve.

That's all I have tonight. I know it's not much, but I like it. While there are disadvantages to having so few types, in this case, I believe it's important as it's more important to prevent mistakes and make things as easy as possible to accomplish what you're trying to do vs. creating the fastest language.

Labels: ,


Post a Comment

Links to this post:

Create a Link

<< Home