Additionally, if one has an array of floats, is '+' overloaded to be the same as a vector addition, or an array concatenation?
In D, these problems are avoided by introducing a new binary operator ~ as the concatenation operator. It works with arrays (of which strings are a subset). ~= is the corresponding append operator. ~ on arrays of floats would concatenate them, + would imply a vector add. Adding a new operator makes it possible for orthogonality and consistency in the treatment of arrays. (In D, strings are simply arrays of characters, not a special type.)
const char abc[5] = "world"; string str = "hello" + abc;That isn't going to work. But it does work when the core language knows about strings:
const char[5] abc = "world"; char[] str = "hello" ~ abc;
const char abc[] = "world"; : sizeof(abc)/sizeof(abc[0])-1 : strlen(abc) string str : str.length()That kind of inconsistency makes it hard to write generic templates. Consider D:
char[5] abc = "world"; : abc.length char[] str : str.length
string str; if (str.empty()) // string is emptyIn D, an empty string is just null:
char[] str; if (!str) // string is empty
string str; str.resize(newsize);D takes advantage of knowing that str is a string, and so resizing it is just changing the length property:
char[] str; str.length = newsize;
string s1 = "hello world"; string s2(s1, 6, 5); // s2 is "world"D has the array slice syntax, not possible with C++:
char[] s1 = "hello world"; char[] s2 = s1[6 .. 11]; // s2 is "world"Slicing, of course, works with any array in D, not just strings.
string s1 = "hello world"; string s2 = "goodbye "; s2.replace(8, 5, s1, 6, 5); // s2 is "goodbye world"D uses the slice syntax as an lvalue:
char[] s1 = "hello world"; char[] s2 = "goodbye "; s2[8..13] = s1[6..11]; // s2 is "goodbye world"
void foo(const char *); string s1; foo(s1.c_str());In D, strings can be implicitly converted to char*:
void foo(char *); char[] s1; foo(s1);Note: some will argue that it is a mistake in D to have an implicit conversion from char[] to char*.
switch (str) { case "hello": case "world": ... }where str can be any of literal "string"s, fixed string arrays like char[10], or dynamic strings like char[]. A quality implementation can, of course, explore many strategies of efficiently implementing this based on the contents of the case strings.
string str = "hello"; str.replace(1,2,2,'?'); // str is "h??lo"In D, use the array slicing syntax in the natural manner:
char[] str = "hello"; str[1..2] = '?'; // str is "h??lo"
D strings are reference types, and the memory is garbage collected. This means that only references need to be copied, not the string data. D strings can refer to data in the static data segment, data on the stack, data inside other strings, objects, file buffers, etc. There's no need to keep track of the 'owner' of the string data.
The obvious question is if multiple D strings refer to the same string data, what happens if the data is modified? All the references will now point to the modified data. This can have its own consequences, which can be avoided if the copy-on-write convention is followed. All copy-on-write is is that if a string is written to, an actual copy of the string data is made first.
The result of D strings being reference only and garbage collected is that code that does a lot of string manipulating, such as an lzw compressor, can be a lot more efficient in terms of both memory consumption and speed.
import file; int main (char[][] args) { int w_total; int l_total; int c_total; int[char[]] dictionary; printf(" lines words bytes file\n"); for (int i = 1; i < args.length; ++i) { char[] input; int w_cnt, l_cnt, c_cnt; int inword; int wstart; input = cast(char[])file.read(args[i]); for (int j = 0; j < input.length; j++) { char c; c = input[j]; if (c == '\n') ++l_cnt; if (c >= '0' && c <= '9') { } else if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z') { if (!inword) { wstart = j; inword = 1; ++w_cnt; } } else if (inword) { char[] word = input[wstart .. j]; dictionary[word]++; inword = 0; } ++c_cnt; } if (inword) { char[] w = input[wstart .. input.length]; dictionary[w]++; } printf("%8lu%8lu%8lu %.*s\n", l_cnt, w_cnt, c_cnt, args[i]); l_total += l_cnt; w_total += w_cnt; c_total += c_cnt; } if (args.length > 2) { printf("--------------------------------------\n%8lu%8lu%8lu total", l_total, w_total, c_total); } printf("--------------------------------------\n"); foreach (char[] word1; dictionary.keys.sort) { printf("%3d %.*s\n", dictionary[word1], word1); } return 0; }Two people have written C++ implementations using the C++ standard template library, wccpp1 and wccpp2. The input file alice30.txt is the text of "Alice in Wonderland." The D compiler, dmd, and the C++ compiler, dmc, share the same optimizer and code generator, which provides a more apples to apples comparison of the efficiency of the semantics of the languages rather than the optimization and code generator sophistication. Tests were run on a Win XP machine. dmc uses STLport for the template implementation.
Program | Compile | Compile Time | Run | Run Time |
---|---|---|---|---|
D wc | dmd wc -O -release | 0.0719 | wc alice30.txt >log | 0.0326 |
C++ wccpp1 | dmc wccpp1 -o -I\dm\stlport\stlport | 2.1917 | wccpp1 alice30.txt >log | 0.0944 |
C++ wccpp2 | dmc wccpp2 -o -I\dm\stlport\stlport | 2.0463 | wccpp2 alice30.txt >log | 0.1012 |
The following tests were run on linux, again comparing a D compiler (gdc) and a C++ compiler (g++) that share a common optimizer and code generator. The system is Pentium III 800MHz running RedHat Linux 8.0 and gcc 3.4.2. The Digital Mars D compiler for linux (dmd) is included for comparison.
Program | Compile | Compile Time | Run | Run Time |
---|---|---|---|---|
D wc | gdc -O2 -frelease -o wc wc.d | 0.326 | wc alice30.txt > /dev/null | 0.041 |
D wc | dmd wc -O -release | 0.235 | wc alice30.txt > /dev/null | 0.041 |
C++ wccpp1 | g++ -O2 -o wccpp1 wccpp1.cc | 2.874 | wccpp1 alice30.txt > /dev/null | 0.086 |
C++ wccpp2 | g++ -O2 -o wccpp2 wccpp2.cc | 2.886 | wccpp2 alice30.txt > /dev/null | 0.095 |
These tests compare gdc with g++ on a PowerMac G5 2x2.0GHz running MacOS X 10.3.5 and gcc 3.4.2. (Timings are a little less accurate.)
Program | Compile | Compile Time | Run | Run Time |
---|---|---|---|---|
D wc | gdc -O2 -frelease -o wc wc.d | 0.28 | wc alice30.txt > /dev/null | 0.03 |
C++ wccpp1 | g++ -O2 -o wccpp1 wccpp1.cc | 1.90 | wccpp1 alice30.txt > /dev/null | 0.07 |
C++ wccpp2 | g++ -O2 -o wccpp2 wccpp2.cc | 1.88 | wccpp2 alice30.txt > /dev/null | 0.08 |
#include <algorithm> #include <cstdio> #include <fstream> #include <iterator> #include <map> #include <vector> bool isWordStartChar (char c) { return isalpha(c); } bool isWordEndChar (char c) { return !isalnum(c); } int main (int argc, char const* argv[]) { using namespace std; printf("Lines Words Bytes File:\n"); map<string, int> dict; int tLines = 0, tWords = 0, tBytes = 0; for(int i = 1; i < argc; i++) { ifstream file(argv[i]); istreambuf_iterator<char> from(file.rdbuf()), to; vector<char> v(from, to); vector<char>::iterator first = v.begin(), last = v.end(), bow, eow; int numLines = count(first, last, '\n'); int numWords = 0; int numBytes = last - first; for(eow = first; eow != last; ) { bow = find_if(eow, last, isWordStartChar); eow = find_if(bow, last, isWordEndChar); if(bow != eow) ++dict[string(bow, eow)], ++numWords; } printf("%5d %5d %5d %s\n", numLines, numWords, numBytes, argv[i]); tLines += numLines; tWords += numWords; tBytes += numBytes; } if(argc > 2) printf("-----------------------\n%5d %5d %5d\n", tLines, tWords, tBytes); printf("-----------------------\n\n"); for(map<string, int>::const_iterator it = dict.begin(); it != dict.end(); ++it) printf("%5d %s\n", it->second, it->first.c_str()); return 0; }