Introduction
Often an Arduino is used to process incoming string data. For example:
- Location data from a GPS
- Card data from a RFID reader
- Commands from a connected computer
- Commands from another Arduino
- Information from a sensor
There is usually quite a bit of debate about the "best" way of handling this incoming data. By its nature the data arrives one byte at a time (usually) from a serial port, although it could also come from SPI or I2C connections.
Bearing in mind that there is usually a shortage of both RAM (Random Access Memory) which is used to store variables, and PROGMEM (program memory) which is used to store your program (or sketch as the Arduino calls it) we tend to want to use methods that minimize the use of both. Oh, and also be as fast as reasonably possible, so we can do something useful with the data.
A note about the "delay" function
None of the example below use the Arduino "delay" function call. It is not necessary to use this to processing incoming serial data, and in many cases its use actually causes problems. You are best off avoiding the use of delays, and instead use the techniques shown below to build up a string a byte at a time in the main loop, and then leave the processor free to do other things.
Possible methods of storing strings
The main methods you could choose are:
- Use C-string strings
- Use the Arduino "String" class
- Use the STL (Standard Template Library) "string" class
- Use a state machine
Concatenation
The word "concatenate" means "join together", and in general programmers will read incoming characters from the serial port, adding (concatenating) them to the existing string, until some sort of delimiter is reached (eg. the newline character). When the delimiter is reached, the entire string is then processed.
An example of a string from a GPS is:
$GPRMC,161229.487,A,3723.2475,N,12158.3416,W,0.13,309.62,120598,,*10
As incoming data you would receive '$', then 'G', then 'P', then 'R' and so on.
Thus, to build up the whole string you concatenate each character until the newline character is received.
The fourth method mentioned above (a state machine) uses an alternative approach that I will describe later. This does not require the entire string to be stored at once.
The general technique for concatenation is thus:
- Declare a variable to hold the whole string, making it initially empty
- As each byte arrives, add it to the end of the string
- When the delimiter arrives (the end-of-line character) the string is considered "complete" and we now parse it to extract out useful information (such as the latitude and longitude).
Using C-style strings
This technique requires an array of characters to be allocated, where you have to know the size in advance. In the example below "inputLine" is this array, and we have chosen to allow for 100 characters, which is a bit more than the size of the expected string from the GPS.
const unsigned int MAX_INPUT = 100; // how much serial data we expect before a newline
char inputLine [MAX_INPUT]; // where to store the string
unsigned int inputPosition = 0; // how much we have stored
void setup ()
{
Serial.begin(115200);
} // end of setup
// here to process incoming serial data after a terminator received
void processData (const char * data)
{
// decode the data here
} // end of processData
void processIncomingByte (const byte c)
{
switch (c)
{
case '\n': // end of text
inputLine [inputPosition] = 0; // terminating null byte
// terminator reached! process inputLine here ...
processData (inputLine);
// reset buffer for next time
inputPosition = 0;
break;
default:
// keep adding if not full ... allow for terminating null byte
if (inputPosition < (MAX_INPUT - 1))
inputLine [inputPosition++] = c;
break;
} // end of switch
} // end of processIncomingByte
void loop()
{
if (Serial.available () > 0)
processIncomingByte (Serial.read ());
// do other stuff here like testing digital input (button presses) ...
} // end of loop
Analysis
- Time taken to concatenate 100 bytes: 44 µS.
- Memory used: 517 bytes.
- Sketch size: 1,782 bytes.
- Fragmentation of dynamic memory: none
Using the Arduino "String" class
The String class is part of the Arduino IDE (Integrated Development Environment). You don't need to install it, and thus there are no "#include" directives needed. It is simple to use, but that simplicity comes at a cost: speed and program size.
String inputLine; // where to store the string
void setup ()
{
Serial.begin(115200);
} // end of setup
// here to process incoming serial data after a terminator received
void processData (const String data)
{
// decode the data here
} // end of processData
void processIncomingByte (const byte c)
{
switch (c)
{
case '\n': // end of text
// terminator reached! process inputLine here ...
processData (inputLine);
// reset for next time
inputLine = "";
break;
default:
// keep adding
inputLine += c;
break;
} // end of switch
} // end of processIncomingByte
void loop()
{
if (Serial.available () > 0)
processIncomingByte (Serial.read ());
// do other stuff here like testing digital input (button presses) ...
} // end of loop
Analysis
- Time taken to concatenate 100 bytes: 2,480 µS.
- Memory used: 526 bytes.
- Sketch size: 3,746 bytes.
- Fragmentation of dynamic memory: none
This particular example did not fragment memory, however there is always a danger when concatenating a byte at a time that the memory allocations required cause fragments of unused memory to start collecting. Over time this can result in free memory disappearing with the possible result that your program crashes, maybe hours later.
Using the Standard Template Library "string" class
The Standard Template Library (STL) comes with its own "string" class (note the lower-case "s") which behaves in a similar way to the Arduino one. However it has its own advantages.
To use it you need to download it from:
http://andybrown.me.uk/ws/2011/01/15/the-standard-template-library-stl-for-avr-with-c-streams/
Then follow the instructions on that page for installing it. Basically you have to copy a whole lot of files into the hardware/tools/avr/avr/include subdirectory of the Arduino installation.
#include <iterator>
#include <string>
#include <pnew.cpp> // placement new implementation
std::string inputLine; // where to store the string
void setup ()
{
Serial.begin(115200);
} // end of setup
// here to process incoming serial data after a terminator received
void processData (const char * data)
{
// decode the data here
} // end of processData
void processIncomingByte (const byte c)
{
switch (c)
{
case '\n': // end of text
// terminator reached! process inputLine here ...
processData (inputLine.c_str ());
// reset for next time
inputLine.clear ();
break;
default:
// keep adding
inputLine += c;
break;
} // end of switch
} // end of processIncomingByte
void loop()
{
if (Serial.available () > 0)
processIncomingByte (Serial.read ());
// do other stuff here like testing digital input (button presses) ...
} // end of loop
Analysis
- Time taken to concatenate 100 bytes: 468 µS.
- Memory used: 554 bytes.
- Sketch size: 2,994 bytes.
- Fragmentation of dynamic memory: one block of 115 bytes.
This example fragmented memory (one block). However since the string class (as opposed to the String class) allocates memory in larger chunks the fragmentation should be more controlled. That is there would not be (or should not be) lots of tiny fragments of memory used. You can reduce this fragmentation by using the "reserve" function. For example:
inputLine.reserve (100); // reserve 100 bytes
The difference here between this and C-style strings is that although we have reserved 100 bytes for the string (to reduce fragmentation of memory) it is still possible to keep appending past the 100-character mark without causing problems (except, possibly running out of memory).
Adding that line to the test sketch reduced the time to concatenate 100 bytes from 468 µS to 300 µS, and the size of the free block from 115 bytes to 8 bytes.
Timing sketch
The sketch I used to compare timings and RAM usage was:
#include <ProfileTimer.h>
#include <iterator>
#include <string>
#include <pnew.cpp> // placement new implementation
#include <memdebug.h>
const int STRING_SIZE = 100;
void showMemoryUsed ()
{
Serial.print (F("Memory free currently = "));
Serial.println (getFreeMemory ());
Serial.print (F("Memory used currently = "));
Serial.println (2048 - getFreeMemory ());
Serial.print (F("Largest available memory block = "));
Serial.println (getLargestAvailableMemoryBlock ());
Serial.print (F("Largest block in free list = "));
Serial.println (getLargestBlockInFreeList ());
Serial.print (F("Number of blocks in free list = "));
Serial.println (getNumberOfBlocksInFreeList ());
} // end of showMemoryUsed
void setup ()
{
Serial.begin (115200);
Serial.println ();
{
String s1;
{
ProfileTimer t ("concatenating String");
for (int i = 0; i < STRING_SIZE; i++)
s1 += 'a';
} // end timed bit of code
Serial.println (s1);
showMemoryUsed ();
Serial.println ();
}
{
std::string s2;
{
ProfileTimer t ("concatenating string");
for (int i = 0; i < STRING_SIZE; i++)
s2 += 'a';
} // end timed bit of code
Serial.println (s2.c_str ());
showMemoryUsed ();
Serial.println ();
}
{
char a [STRING_SIZE + 1];
{
ProfileTimer t ("concatenating char array");
for (int i = 0; i < STRING_SIZE; i++)
a [i] = 'a';
a [STRING_SIZE] = 0; // terminating null
} // end timed bit of code
Serial.println (a);
showMemoryUsed ();
Serial.println ();
}
} // end of setup
void loop () { }
State machine
An alternative to all this concatenating is to use a "state machine".
See the Wikipedia article on Finite-state machine for a theoretical description.
In this case you process each character (without storing them) and use each one to change an internal state. For example, at the start of the line when you are expecting "$GPRMC" to arrive, you might have the following states:
- At start of line (expecting '$')
- Got $, expecting 'G'
- Got G, expecting 'P'
- Got P, expecting 'R'
- Got R, expecting 'M'
... and so on.
Knowing the format of the incoming data you can then split off things like the date, time, latitude and longitude into variables "on the fly" without waiting for the whole string to arrive.
Some more discussion and examples here:
http://www.gammon.com.au/serial
The advantage of the state machine is that you don't need to allocate memory for the whole string, and thus it could be thousands of bytes long (more than the memory of the Arduino) as long there was room for the "interesting" part (like the latitude and longitude).
Summary
The STL "string" class is somewhat faster than the Arduino "String" class (468 µS compared to 2480 µS) and compiles into less program memory (2994 bytes compared to 3746 bytes). One drawback is the memory fragmentation (the block of 115 bytes) which would be there because it does not allocate a new block of memory for each concatenated byte like the String class does. This saves time, but can result in more fragmentation.
However using "C-style" strings (as shown above) is the fastest, uses the least memory (RAM), and uses the least program memory. However the drawback is it is a bit fiddlier to use (but not much) and you need to decide in advance how much memory to allocate for the final string.
Using a state machine results in the least amount of memory usage, handy if the incoming string is potentially very large (like a HTML request). However it is probably the most complex one to code and debug.
Method Time Memory Sketch
µS Used Size
C-string 44 517 1782
STL string 468 554 2994
String 2480 526 3746
Notes on C-style strings
So-called "C style" strings are really arrays of type "char" (usually). For example:
char myString [10] = "HELLO";
There is no separate "length" field, so many C functions expect the string to be "null-terminated" like this:
The overall string size is 10 bytes, however you can really only store 9 bytes because you need to allow for the string terminator (the 0x00 byte). The "active" length can be established by a call to the strlen function. For example:
Serial.println ( strlen (myString) ); // prints: 5
The total length can be established by using the sizeof operator. For example:
Serial.println ( sizeof (myString) ); // prints: 10
You can concatenate entire strings by using strcat (string catenate). For example:
strcat (myString, "WORLD");
Note that in this particular example, the 10-character string cannot hold HELLOWORLD plus the trailing 0x00 byte, so that would cause a program crash, or undefined behaviour, of some sort. For this reason you must keep careful track of how many bytes are in C-style strings, particularly if you are adding to their length.
Note that if you use the STL string class, you can use the length function to find the current string length, and the capacity function to find the currently allocated size. For example:
std::string myString = "HELLO";
myString.reserve (50); // reserve 50 characters
Serial.println (myString.length ()); // prints: 5
Serial.println (myString.capacity ()); // prints: 50
|