Ghidra 11.3.2
Ghidra internal decompiler documentation.
|
Storage for decoding and storing strings associated with an address. More...
#include <stringmanage.hh>
Classes | |
class | StringData |
String data (a sequence of bytes) stored by StringManager. More... | |
Public Member Functions | |
StringManager (int4 max) | |
Constructor. | |
virtual | ~StringManager (void) |
Destructor. | |
void | clear (void) |
Clear out any cached strings. | |
bool | isString (const Address &addr, Datatype *charType) |
virtual const vector< uint1 > & | getStringData (const Address &addr, Datatype *charType, bool &isTrunc)=0 |
Retrieve string data at the given address as a UTF8 byte array. | |
uint8 | registerInternalStringData (const Address &addr, const uint1 *buf, int4 size, Datatype *charType) |
Associate string data at a code address or other location that doesn't hold string data normally. | |
void | encode (Encoder &encoder) const |
Encode cached strings to a stream. | |
void | decode (Decoder &decoder) |
Restore string cache from a stream. | |
Static Public Member Functions | |
static bool | hasCharTerminator (const uint1 *buffer, int4 size, int4 charsize) |
Check for a unicode string terminator. | |
static int4 | readUtf16 (const uint1 *buf, bool bigend) |
Read a UTF16 code point from a byte array. | |
static void | writeUtf8 (ostream &s, int4 codepoint) |
Write unicode character to stream in UTF8 encoding. | |
static int4 | checkCharacters (const uint1 *buf, int4 size, int4 charsize, bool bigend) |
Make sure buffer has valid bounded set of unicode. | |
static int4 | getCodepoint (const uint1 *buf, int4 charsize, bool bigend, int4 &skip) |
Extract next unicode codepoint. | |
Protected Member Functions | |
bool | writeUnicode (ostream &s, const uint1 *buffer, int4 size, int4 charsize, bool bigend) |
Translate/copy unicode to UTF8. | |
void | assignStringData (StringData &data, const uint1 *buf, int4 size, int4 charsize, int4 numChars, bool bigend) |
Translate and assign raw string data to a StringData object. | |
Static Protected Member Functions | |
static uint8 | calcInternalHash (const Address &addr, const uint1 *buf, int4 size) |
Calculate hash of a specific Address and contents of a byte array. | |
Protected Attributes | |
map< Address, StringData > | stringMap |
Map from address to string data. | |
int4 | maximumChars |
Maximum characters in a string before truncating. | |
Storage for decoding and storing strings associated with an address.
Looks at data in the loadimage to determine if it represents a "string". Decodes the string for presentation in the output. Stores the decoded string until its needed for presentation. Strings are associated with their starting address in memory. An internal string (that is not in the loadimage) can be registered with the manager and will be associated with a constant.
ghidra::StringManager::StringManager | ( | int4 | max | ) |
max | is the maximum number of characters to allow before truncating string |
References maximumChars.
|
protected |
Translate and assign raw string data to a StringData object.
The string data is provided as raw bytes. The data is translated to UTF-8 and truncated to the maximumChars allowed by the manager. The encoding must be legal unicode as performed by checkCharacters().
data | is the StringData object to populate |
buf | is the raw byte array |
size | is the number of bytes in the array |
charsize | is the size of unicode encoding |
numChars | is the number of characters in the encoding as returned by checkCharacters() |
bigend | is true if UTF-16 and UTF-32 elements are big endian encoded |
References ghidra::StringManager::StringData::byteData, ghidra::StringManager::StringData::isTruncated, maximumChars, and writeUnicode().
Referenced by ghidra::StringManagerUnicode::getStringData(), and registerInternalStringData().
|
staticprotected |
Calculate hash of a specific Address and contents of a byte array.
Calculate a 32-bit CRC of the bytes and XOR into the upper part of the Address offset.
addr | is the specific Address |
buf | is a pointer to the array of bytes |
size | is the number of bytes in the array |
References ghidra::crc_update(), and ghidra::Address::getOffset().
Referenced by registerInternalStringData().
|
static |
Make sure buffer has valid bounded set of unicode.
Check that the given buffer contains valid unicode. If the string is encoded in UTF8 or ASCII, we get (on average) a bit of check per character. For UTF16, the surrogate reserved area gives at least some check.
buf | is the byte array to check |
size | is the size of the buffer in bytes |
charsize | is the UTF encoding (1=UTF8, 2=UTF16, 4=UTF32) |
bigend | is true if the (UTF16 and UTF32) characters are big endian encoded |
References getCodepoint().
Referenced by ghidra::StringManagerUnicode::getStringData(), and registerInternalStringData().
void ghidra::StringManager::decode | ( | Decoder & | decoder | ) |
Restore string cache from a stream.
Parse a <stringmanage> element, with <string> children.
decoder | is the stream decoder |
References ghidra::StringManager::StringData::byteData, ghidra::Decoder::closeElement(), ghidra::Address::decode(), ghidra::StringManager::StringData::isTruncated, ghidra::Decoder::openElement(), ghidra::Decoder::readBool(), ghidra::Decoder::readString(), and stringMap.
Referenced by ghidra::Architecture::restoreXml().
void ghidra::StringManager::encode | ( | Encoder & | encoder | ) | const |
Encode cached strings to a stream.
Encode <stringmanage> element, with <string> children.
encoder | is the stream encoder |
References ghidra::StringManager::StringData::byteData, ghidra::Encoder::closeElement(), ghidra::StringManager::StringData::isTruncated, ghidra::Encoder::openElement(), stringMap, ghidra::Encoder::writeBool(), and ghidra::Encoder::writeString().
Referenced by ghidra::Architecture::encode().
|
static |
Extract next unicode codepoint.
One or more bytes is consumed from the array, and the number of bytes used is passed back.
buf | is a pointer to the bytes in the character array |
charsize | is 1 for UTF8, 2 for UTF16, or 4 for UTF32 |
bigend | is true for big endian encoding of the UTF element |
skip | is a reference for passing back the number of bytes consumed |
References readUtf16().
Referenced by checkCharacters(), ghidra::PrintLanguage::escapeCharacterData(), and writeUnicode().
|
pure virtual |
Retrieve string data at the given address as a UTF8 byte array.
If the address does not represent string data, a zero length vector is returned. Otherwise, the string data is fetched, converted to a UTF8 encoding, cached and returned.
addr | is the given address |
charType | is a character data-type indicating the encoding |
isTrunc | passes back whether the string is truncated |
Implemented in ghidra::StringManagerUnicode, and ghidra::GhidraStringManager.
Referenced by isString(), and ghidra::PrintC::printCharacterConstant().
|
static |
Check for a unicode string terminator.
buffer | is the byte buffer |
size | is the number of bytes in the buffer |
charsize | is the presumed size (in bytes) of character elements |
Referenced by ghidra::StringManagerUnicode::getStringData().
Returns true if the data is some kind of complete string. A given character data-type can be used as a hint for the encoding. The string decoding can be cached internally.
addr | is the given address |
charType | is the given character data-type |
References getStringData().
Referenced by ghidra::RulePtrsubCharConstant::applyOp().
|
inlinestatic |
Read a UTF16 code point from a byte array.
Pull the first two bytes from the byte array and combine them in the indicated endian order
buf | is the byte array |
bigend | is true to request big endian encoding |
Referenced by getCodepoint().
uint8 ghidra::StringManager::registerInternalStringData | ( | const Address & | addr, |
const uint1 * | buf, | ||
int4 | size, | ||
Datatype * | charType | ||
) |
Associate string data at a code address or other location that doesn't hold string data normally.
The given byte buffer is decoded, and if it represents a legal string, a non-zero hash is returned, constructed from an Address associated with the string and the string data itself. The registered string can be retrieved via the getStringData() method using this hash as a constant Address. If the string is not legal, 0 is returned.
addr | is the address to associate with the string data |
buf | is a pointer to the array of raw bytes encoding the string |
size | is the number of bytes in the array |
charType | is a character data-type indicating the encoding |
References assignStringData(), ghidra::StringManager::StringData::byteData, calcInternalHash(), checkCharacters(), ghidra::AddrSpaceManager::getConstant(), ghidra::AddrSpace::getManager(), ghidra::Datatype::getSize(), ghidra::Address::getSpace(), ghidra::Address::isBigEndian(), ghidra::StringManager::StringData::isTruncated, and stringMap.
Referenced by ghidra::Funcdata::getInternalString().
|
protected |
Translate/copy unicode to UTF8.
Assume the buffer contains a null terminated unicode encoded string. Write the characters out (as UTF8) to the stream.
s | is the output stream |
buffer | is the given byte buffer |
size | is the number of bytes in the buffer |
charsize | specifies the encoding (1=UTF8 2=UTF16 4=UTF32) |
bigend | is true if (UTF16 and UTF32) are big endian encoded |
References getCodepoint(), maximumChars, and writeUtf8().
Referenced by assignStringData().
|
static |
Write unicode character to stream in UTF8 encoding.
Encode the given unicode codepoint as UTF8 (1, 2, 3, or 4 bytes) and write the bytes to the stream.
s | is the output stream |
codepoint | is the unicode codepoint |
Referenced by ghidra::PrintC::printUnicode(), ghidra::PrintJava::printUnicode(), and writeUnicode().