Ghidra 11.3.2
Ghidra internal decompiler documentation.
Loading...
Searching...
No Matches
ghidra::StringManager Class Referenceabstract

Storage for decoding and storing strings associated with an address. More...

#include <stringmanage.hh>

Inheritance diagram for ghidra::StringManager:
[legend]
Collaboration diagram for ghidra::StringManager:
[legend]

Classes

class  StringData
 String data (a sequence of bytes) stored by StringManager. More...
 

Public Member Functions

 StringManager (int4 max)
 Constructor.
 
virtual ~StringManager (void)
 Destructor.
 
void clear (void)
 Clear out any cached strings.
 
bool isString (const Address &addr, Datatype *charType)
 
virtual const vector< uint1 > & getStringData (const Address &addr, Datatype *charType, bool &isTrunc)=0
 Retrieve string data at the given address as a UTF8 byte array.
 
uint8 registerInternalStringData (const Address &addr, const uint1 *buf, int4 size, Datatype *charType)
 Associate string data at a code address or other location that doesn't hold string data normally.
 
void encode (Encoder &encoder) const
 Encode cached strings to a stream.
 
void decode (Decoder &decoder)
 Restore string cache from a stream.
 

Static Public Member Functions

static bool hasCharTerminator (const uint1 *buffer, int4 size, int4 charsize)
 Check for a unicode string terminator.
 
static int4 readUtf16 (const uint1 *buf, bool bigend)
 Read a UTF16 code point from a byte array.
 
static void writeUtf8 (ostream &s, int4 codepoint)
 Write unicode character to stream in UTF8 encoding.
 
static int4 checkCharacters (const uint1 *buf, int4 size, int4 charsize, bool bigend)
 Make sure buffer has valid bounded set of unicode.
 
static int4 getCodepoint (const uint1 *buf, int4 charsize, bool bigend, int4 &skip)
 Extract next unicode codepoint.
 

Protected Member Functions

bool writeUnicode (ostream &s, const uint1 *buffer, int4 size, int4 charsize, bool bigend)
 Translate/copy unicode to UTF8.
 
void assignStringData (StringData &data, const uint1 *buf, int4 size, int4 charsize, int4 numChars, bool bigend)
 Translate and assign raw string data to a StringData object.
 

Static Protected Member Functions

static uint8 calcInternalHash (const Address &addr, const uint1 *buf, int4 size)
 Calculate hash of a specific Address and contents of a byte array.
 

Protected Attributes

map< Address, StringDatastringMap
 Map from address to string data.
 
int4 maximumChars
 Maximum characters in a string before truncating.
 

Detailed Description

Storage for decoding and storing strings associated with an address.

Looks at data in the loadimage to determine if it represents a "string". Decodes the string for presentation in the output. Stores the decoded string until its needed for presentation. Strings are associated with their starting address in memory. An internal string (that is not in the loadimage) can be registered with the manager and will be associated with a constant.

Constructor & Destructor Documentation

◆ StringManager()

ghidra::StringManager::StringManager ( int4  max)

Constructor.

Parameters
maxis the maximum number of characters to allow before truncating string

References maximumChars.

Member Function Documentation

◆ assignStringData()

void ghidra::StringManager::assignStringData ( StringData data,
const uint1 *  buf,
int4  size,
int4  charsize,
int4  numChars,
bool  bigend 
)
protected

Translate and assign raw string data to a StringData object.

The string data is provided as raw bytes. The data is translated to UTF-8 and truncated to the maximumChars allowed by the manager. The encoding must be legal unicode as performed by checkCharacters().

Parameters
datais the StringData object to populate
bufis the raw byte array
sizeis the number of bytes in the array
charsizeis the size of unicode encoding
numCharsis the number of characters in the encoding as returned by checkCharacters()
bigendis true if UTF-16 and UTF-32 elements are big endian encoded

References ghidra::StringManager::StringData::byteData, ghidra::StringManager::StringData::isTruncated, maximumChars, and writeUnicode().

Referenced by ghidra::StringManagerUnicode::getStringData(), and registerInternalStringData().

◆ calcInternalHash()

uint8 ghidra::StringManager::calcInternalHash ( const Address addr,
const uint1 *  buf,
int4  size 
)
staticprotected

Calculate hash of a specific Address and contents of a byte array.

Calculate a 32-bit CRC of the bytes and XOR into the upper part of the Address offset.

Parameters
addris the specific Address
bufis a pointer to the array of bytes
sizeis the number of bytes in the array
Returns
the 64-bit hash

References ghidra::crc_update(), and ghidra::Address::getOffset().

Referenced by registerInternalStringData().

◆ checkCharacters()

int4 ghidra::StringManager::checkCharacters ( const uint1 *  buf,
int4  size,
int4  charsize,
bool  bigend 
)
static

Make sure buffer has valid bounded set of unicode.

Check that the given buffer contains valid unicode. If the string is encoded in UTF8 or ASCII, we get (on average) a bit of check per character. For UTF16, the surrogate reserved area gives at least some check.

Parameters
bufis the byte array to check
sizeis the size of the buffer in bytes
charsizeis the UTF encoding (1=UTF8, 2=UTF16, 4=UTF32)
bigendis true if the (UTF16 and UTF32) characters are big endian encoded
Returns
the number of characters or -1 if there is an invalid encoding

References getCodepoint().

Referenced by ghidra::StringManagerUnicode::getStringData(), and registerInternalStringData().

◆ decode()

void ghidra::StringManager::decode ( Decoder decoder)

◆ encode()

void ghidra::StringManager::encode ( Encoder encoder) const

Encode cached strings to a stream.

Encode <stringmanage> element, with <string> children.

Parameters
encoderis the stream encoder

References ghidra::StringManager::StringData::byteData, ghidra::Encoder::closeElement(), ghidra::StringManager::StringData::isTruncated, ghidra::Encoder::openElement(), stringMap, ghidra::Encoder::writeBool(), and ghidra::Encoder::writeString().

Referenced by ghidra::Architecture::encode().

◆ getCodepoint()

int4 ghidra::StringManager::getCodepoint ( const uint1 *  buf,
int4  charsize,
bool  bigend,
int4 &  skip 
)
static

Extract next unicode codepoint.

One or more bytes is consumed from the array, and the number of bytes used is passed back.

Parameters
bufis a pointer to the bytes in the character array
charsizeis 1 for UTF8, 2 for UTF16, or 4 for UTF32
bigendis true for big endian encoding of the UTF element
skipis a reference for passing back the number of bytes consumed
Returns
the codepoint or -1 if the encoding is invalid

References readUtf16().

Referenced by checkCharacters(), ghidra::PrintLanguage::escapeCharacterData(), and writeUnicode().

◆ getStringData()

virtual const vector< uint1 > & ghidra::StringManager::getStringData ( const Address addr,
Datatype charType,
bool &  isTrunc 
)
pure virtual

Retrieve string data at the given address as a UTF8 byte array.

If the address does not represent string data, a zero length vector is returned. Otherwise, the string data is fetched, converted to a UTF8 encoding, cached and returned.

Parameters
addris the given address
charTypeis a character data-type indicating the encoding
isTruncpasses back whether the string is truncated
Returns
the byte array of UTF8 data

Implemented in ghidra::StringManagerUnicode, and ghidra::GhidraStringManager.

Referenced by isString(), and ghidra::PrintC::printCharacterConstant().

◆ hasCharTerminator()

bool ghidra::StringManager::hasCharTerminator ( const uint1 *  buffer,
int4  size,
int4  charsize 
)
static

Check for a unicode string terminator.

Parameters
bufferis the byte buffer
sizeis the number of bytes in the buffer
charsizeis the presumed size (in bytes) of character elements
Returns
true if a string terminator is found

Referenced by ghidra::StringManagerUnicode::getStringData().

◆ isString()

bool ghidra::StringManager::isString ( const Address addr,
Datatype charType 
)

Returns true if the data is some kind of complete string. A given character data-type can be used as a hint for the encoding. The string decoding can be cached internally.

Parameters
addris the given address
charTypeis the given character data-type
Returns
true if the address represents string data

References getStringData().

Referenced by ghidra::RulePtrsubCharConstant::applyOp().

◆ readUtf16()

int4 ghidra::StringManager::readUtf16 ( const uint1 *  buf,
bool  bigend 
)
inlinestatic

Read a UTF16 code point from a byte array.

Pull the first two bytes from the byte array and combine them in the indicated endian order

Parameters
bufis the byte array
bigendis true to request big endian encoding
Returns
the decoded UTF16 element

Referenced by getCodepoint().

◆ registerInternalStringData()

uint8 ghidra::StringManager::registerInternalStringData ( const Address addr,
const uint1 *  buf,
int4  size,
Datatype charType 
)

Associate string data at a code address or other location that doesn't hold string data normally.

The given byte buffer is decoded, and if it represents a legal string, a non-zero hash is returned, constructed from an Address associated with the string and the string data itself. The registered string can be retrieved via the getStringData() method using this hash as a constant Address. If the string is not legal, 0 is returned.

Parameters
addris the address to associate with the string data
bufis a pointer to the array of raw bytes encoding the string
sizeis the number of bytes in the array
charTypeis a character data-type indicating the encoding
Returns
a hash associated with the string or 0

References assignStringData(), ghidra::StringManager::StringData::byteData, calcInternalHash(), checkCharacters(), ghidra::AddrSpaceManager::getConstant(), ghidra::AddrSpace::getManager(), ghidra::Datatype::getSize(), ghidra::Address::getSpace(), ghidra::Address::isBigEndian(), ghidra::StringManager::StringData::isTruncated, and stringMap.

Referenced by ghidra::Funcdata::getInternalString().

◆ writeUnicode()

bool ghidra::StringManager::writeUnicode ( ostream &  s,
const uint1 *  buffer,
int4  size,
int4  charsize,
bool  bigend 
)
protected

Translate/copy unicode to UTF8.

Assume the buffer contains a null terminated unicode encoded string. Write the characters out (as UTF8) to the stream.

Parameters
sis the output stream
bufferis the given byte buffer
sizeis the number of bytes in the buffer
charsizespecifies the encoding (1=UTF8 2=UTF16 4=UTF32)
bigendis true if (UTF16 and UTF32) are big endian encoded
Returns
true if the byte array contains valid unicode

References getCodepoint(), maximumChars, and writeUtf8().

Referenced by assignStringData().

◆ writeUtf8()

void ghidra::StringManager::writeUtf8 ( ostream &  s,
int4  codepoint 
)
static

Write unicode character to stream in UTF8 encoding.

Encode the given unicode codepoint as UTF8 (1, 2, 3, or 4 bytes) and write the bytes to the stream.

Parameters
sis the output stream
codepointis the unicode codepoint

Referenced by ghidra::PrintC::printUnicode(), ghidra::PrintJava::printUnicode(), and writeUnicode().


The documentation for this class was generated from the following files: