PHP Classes

File: tests/fixtures/test10Html.txt

Recommend this page to a friend!
  Classes of Lars Moelleken   PHP HTML to Text Conversion   tests/fixtures/test10Html.txt   Download  
File: tests/fixtures/test10Html.txt
Role: Documentation
Content type: text/plain
Description: Documentation
Class: PHP HTML to Text Conversion
Parse HTML and extract text contained in it
Author: By
Last change: [+]: new option for "htmlspecialchars_decode()" usage
[+]: fix string repeat (when html tags are inside other html tags)

-> thanks @ vibrantBits -> https://github.com/voku/html2text/pull/17
Date: 3 years ago
Size: 35,931 bytes
 

Contents

Class file image Download
[](/#portable-utf-8--api) PORTABLE UTF-8 | API The API from the "UTF8"-Class is written as small static methods that will match the default PHP-API e.g. [](/#methods) METHODS [](/#accessstring-str-int-pos) access(string $str, int $pos) Return the character at the specified position: $str[1] like functionality. ```php UTF8::access('fòô', 1); // 'ò' ``` [](/#add_bom_to_stringstring-str) add_bom_to_string(string $str) Prepends UTF-8 BOM character to the string and returns the whole string. If BOM already existed there, the Input string is returned. ```php UTF8::add_bom_to_string('fòô'); // "\xEF\xBB\xBF" . 'fòô' ``` [](/#binary_to_strmixed-bin) binary_to_str(mixed $bin) Convert binary into an string. INFO: opposite to UTF8::str_to_binary() ```php UTF8::binary_to_str('11110000100111111001100010000011'); // '😃' ``` [](/#bom) bom() Returns the UTF-8 Byte Order Mark Character. ```php UTF8::bom(); // "\xEF\xBB\xBF" ``` [](/#chrint-code_point--string) chr(int $code_point) : string Generates a UTF-8 encoded character from the given code point. INFO: opposite to UTF8::ord() ```php UTF8::chr(666); // 'ʚ' ``` [](/#chr_mapstringarray-callback-string-str--array) chr_map(string|array $callback, string $str) : array Applies callback to all characters of a string. ```php UTF8::chr_map(['voku\helper\UTF8', 'strtolower'], 'Κόσμε'); // ['κ','ό', 'σ', 'μ', 'ε'] ``` [](/#chr_size_liststring-str--array) chr_size_list(string $str) : array Generates a UTF-8 encoded character from the given code point. 1 byte => U+0000 - U+007F 2 byte => U+0080 - U+07FF 3 byte => U+0800 - U+FFFF 4 byte => U+10000 - U+10FFFF ```php UTF8::chr_size_list('中文空白-test'); // [3, 3, 3, 3, 1, 1, 1, 1, 1] ``` [](/#chr_to_decimalstring-chr--int) chr_to_decimal(string $chr) : int Get a decimal code representation of a specific character. ```php UTF8::chr_to_decimal('§'); // 0xa7 ``` [](/#chr_to_hexstring-chr-string-pfix--u) chr_to_hex(string $chr, string $pfix = 'U+') Get hexadecimal code point (U+xxxx) of a UTF-8 encoded character. ```php UTF8::chr_to_hex('§'); // 0xa7 ``` [](/#chunk_splitstring-body-int-chunklen--76-string-end--rn--string) chunk_split(string $body, int $chunklen = 76, string $end = "\r\n") : string Splits a string into smaller chunks and multiple lines, using the specified line ending character. ```php UTF8::chunk_split('ABC-ÖÄÜ-中文空白-κόσμε', 3); // "ABC\r\n-ÖÄ\r\nÜ-中\r\n文空白\r\n-κό\r\nσμε" ``` [](/#cleanstring-str-bool-remove_bom--false-bool-normalize_whitespace--false-bool-normalize_msword--false-bool-keep_non_breaking_space--false--string) clean(string $str, bool $remove_bom = false, bool $normalize_whitespace = false, bool $normalize_msword = false, bool $keep_non_breaking_space = false) : string Accepts a string and removes all non-UTF-8 characters from it + extras if needed. ```php UTF8::clean("\xEF\xBB\xBF„Abcdef\xc2\xa0\x20…” — 😃 - Düsseldorf", true, true); // '„Abcdef …” — 😃 - Düsseldorf' ``` [](/#cleanupstring-str--string) cleanup(string $str) : string Clean-up a and show only printable UTF-8 chars at the end + fix UTF-8 encoding. ```php UTF8::cleanup("\xEF\xBB\xBF„Abcdef\xc2\xa0\x20…” — 😃 - Düsseldorf", true, true); // '„Abcdef …” — 😃 - Düsseldorf' ``` [](/#codepointsmixed-arg-bool-u_style--false--array) codepoints(mixed $arg, bool $u_style = false) : array Accepts a string and returns an array of Unicode code points. INFO: opposite to UTF8::string() ```php UTF8::codepoints('κöñ'); // array(954, 246, 241) // ... OR ... UTF8::codepoints('κöñ', true); // array('U+03ba', 'U+00f6', 'U+00f1') ``` [](/#count_charsstring-str-bool-cleanutf8--false--array) count_chars(string $str, bool $cleanUtf8 = false) : array Returns count of characters used in a string. ```php UTF8::count_chars('κaκbκc'); // array('κ' => 3, 'a' => 1, 'b' => 1, 'c' => 1) ``` [](/#encodestring-encoding-string-str-bool-force--true--string) encode(string $encoding, string $str, bool $force = true) : string Encode a string with a new charset-encoding. INFO: The different to "UTF8::utf8_encode()" is that this function, try to fix also broken / double encoding, so you can call this function also on a UTF-8 String and you don't mess the string. ```php UTF8::encode('ISO-8859-1', '-ABC-中文空白-'); // '-ABC-????-' // UTF8::encode('UTF-8', '-ABC-中文空白-'); // '-ABC-中文空白-' ``` [](/#file_get_contentsstring-filename-intnull-flags--null-resourcenull-context--null-intnull-offset--null-intnull-maxlen--null-int-timeout--10-bool-converttoutf8--true--string) file_get_contents(string $filename, int|null $flags = null, resource|null $context = null, int|null $offset = null, int|null $maxlen = null, int $timeout = 10, bool $convertToUtf8 = true) : string Reads entire file into a string. WARNING: do not use UTF-8 Option ($convertToUtf8) for binary-files (e.g.: images) !!! ```php UTF8::file_get_contents('utf16le.txt'); // ... ``` [](/#file_has_bomstring-file_path--bool) file_has_bom(string $file_path) : bool Checks if a file starts with BOM (Byte Order Mark) character. ```php UTF8::file_has_bom('utf8_with_bom.txt'); // true ``` [](/#filtermixed-var-int-normalization_form--4-string-leading_combining----mixed) filter(mixed $var, int $normalization_form = 4, string $leading_combining = '◌') : mixed Normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed. ```php UTF8::filter(array("\xE9", 'à', 'a')); // array('é', 'à', 'a') ``` [](/#filter_inputint-type-string-var-int-filter--filter_default-nullarray-option--null--string) filter_input(int $type, string $var, int $filter = FILTER_DEFAULT, null|array $option = null) : string "filter_input()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed. ```php // _GET['foo'] = 'bar'; UTF8::filter_input(INPUT_GET, 'foo', FILTER_SANITIZE_STRING)); // 'bar' ``` [](/#filter_input_arrayint-type-mixed-definition--null-bool-add_empty--true--mixed) filter_input_array(int $type, mixed $definition = null, bool $add_empty = true) : mixed "filter_input_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed. ```php // _GET['foo'] = 'bar'; UTF8::filter_input_array(INPUT_GET, array('foo' => 'FILTER_SANITIZE_STRING')); // array('bar') ``` [](/#filter_varstring-var-int-filter--filter_default-array-option--null--string) filter_var(string $var, int $filter = FILTER_DEFAULT, array $option = null) : string "filter_var()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed. ```php UTF8::filter_var('-ABC-中文空白-', FILTER_VALIDATE_URL); // false ``` [](/#filter_var_arrayarray-data-mixed-definition--null-bool-add_empty--true--mixed) filter_var_array(array $data, mixed $definition = null, bool $add_empty = true) : mixed "filter_var_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed. ```php $filters = [ 'name' => ['filter' => FILTER_CALLBACK, 'options' => ['voku\helper\UTF8', 'ucwords']], 'age' => ['filter' => FILTER_VALIDATE_INT, 'options' => ['min_range' => 1, 'max_range' => 120]], 'email' => FILTER_VALIDATE_EMAIL, ]; $data = [ 'name' => 'κόσμε', 'age' => '18', 'email' => 'foo@bar.de' ]; UTF8::filter_var_array($data, $filters, true); // ['name' => 'Κόσμε', 'age' => 18, 'email' => 'foo@bar.de'] ``` [](/#fits_insidestring-str-int-box_size--bool) fits_inside(string $str, int $box_size) : bool Check if the number of unicode characters are not more than the specified integer. ```php UTF8::fits_inside('κόσμε', 6); // false ``` [](/#fix_simple_utf8string-str--string) fix_simple_utf8(string $str) : string Try to fix simple broken UTF-8 strings. INFO: Take a look at "UTF8::fix_utf8()" if you need a more advanced fix for broken UTF-8 strings. ```php UTF8::fix_simple_utf8('Düsseldorf'); // 'Düsseldorf' ``` [](/#fix_utf8stringstring-str--mixed) fix_utf8(string|string[] $str) : mixed Fix a double (or multiple) encoded UTF8 string. ```php UTF8::fix_utf8('Fédération'); // 'Fédération' ``` [](/#getchardirectionstring-char--string-rtl-or-ltr) getCharDirection(string $char) : string ('RTL' or 'LTR') Get character of a specific character. ```php UTF8::getCharDirection('ا'); // 'RTL' ``` [](/#getchardirectionstring-char--string-rtl-or-ltr-1) getCharDirection(string $char) : string ('RTL' or 'LTR') Get character of a specific character. ```php UTF8::getCharDirection('ا'); // 'RTL' ``` [](/#hex_to_intstring-str--intfalse) hex_to_int(string $str) : int|false Converts hexadecimal U+xxxx code point representation to integer. INFO: opposite to UTF8::int_to_hex() ```php UTF8::hex_to_int('U+00f1'); // 241 ``` [](/#html_encodestring-str-bool-keepasciichars--false-string-encoding--utf-8--string) html_encode(string $str, bool $keepAsciiChars = false, string $encoding = 'UTF-8') : string Converts a UTF-8 string to a series of HTML numbered entities. INFO: opposite to UTF8::html_decode() ```php UTF8::html_encode('中文空白'); // '中文空白' ``` [](/#html_entity_decodestring-str-int-flags--null-string-encoding--utf-8--string) html_entity_decode(string $str, int $flags = null, string $encoding = 'UTF-8') : string UTF-8 version of html_entity_decode() The reason we are not using html_entity_decode() by itself is because while it is not technically correct to leave out the semicolon at the end of an entity most browsers will still interpret the entity correctly. html_entity_decode() does not convert entities without semicolons, so we are left with our own little solution here. Bummer. Convert all HTML entities to their applicable characters INFO: opposite to UTF8::html_encode() ```php UTF8::html_encode('中文空白'); // '中文空白' ``` [](/#htmlentitiesstring-str-int-flags--ent_compat-string-encoding--utf-8-bool-double_encode--true--string) htmlentities(string $str, int $flags = ENT_COMPAT, string $encoding = 'UTF-8', bool $double_encode = true) : string Convert all applicable characters to HTML entities: UTF-8 version of htmlentities() ```php UTF8::htmlentities('<白-öäü>'); // '<白-öäü>' ``` [](/#htmlspecialcharsstring-str-int-flags--ent_compat-string-encoding--utf-8-bool-double_encode--true--string) htmlspecialchars(string $str, int $flags = ENT_COMPAT, string $encoding = 'UTF-8', bool $double_encode = true) : string Convert only special characters to HTML entities: UTF-8 version of htmlspecialchars() INFO: Take a look at "UTF8::htmlentities()" ```php UTF8::htmlspecialchars('<白-öäü>'); // '<白-öäü>' ``` [](/#int_to_hexint-int-string-pfix--u--str) int_to_hex(int $int, string $pfix = 'U+') : str Converts Integer to hexadecimal U+xxxx code point representation. INFO: opposite to UTF8::hex_to_int() ```php UTF8::int_to_hex(241); // 'U+00f1' ``` [](/#is_asciistring-str--bool) is_ascii(string $str) : bool Checks if a string is 7 bit ASCII. alias: UTF8::isAscii() ```php UTF8::is_ascii('白'); // false ``` [](/#is_base64string-str--bool) is_base64(string $str) : bool Returns true if the string is base64 encoded, false otherwise. alias: UTF8::isBase64() ```php UTF8::is_base64('4KSu4KWL4KSo4KS/4KSa'); // true ``` [](/#is_binarymixed-input--bool) is_binary(mixed $input) : bool Check if the input is binary... (is look like a hack). alias: UTF8::isBinary() ```php UTF8::is_binary(01); // true ``` [](/#is_binary_filestring-file--bool) is_binary_file(string $file) : bool Check if the file is binary. ```php UTF8::is_binary('./utf32.txt'); // true ``` [](/#is_bomstring-str--bool) is_bom(string $str) : bool Checks if the given string is equal to any "Byte Order Mark". WARNING: Use "UTF8::string_has_bom()" if you will check BOM in a string. alias: UTF8::isBom() ```php UTF8::is_bom("\xef\xbb\xbf"); // true ``` [](/#is_jsonstring-str--bool) is_json(string $str) : bool Try to check if "$str" is an json-string. alias: UTF8::isJson() ```php UTF8::is_json('{"array":[1,"¥","ä"]}'); // true ``` [](/#is_htmlstring-str--bool) is_html(string $str) : bool Check if the string contains any html-tags . alias: UTF8::isHtml() ```php UTF8::is_html('<b>lall</b>'); // true ``` [](/#is_utf16string-str--intfalse) is_utf16(string $str) : int|false Check if the string is UTF-16: This function will return false if is't not UTF-16, 1 for UTF-16LE, 2 for UTF-16BE. alias: UTF8::isUtf16() ```php UTF8::is_utf16(file_get_contents('utf-16-le.txt')); // 1 UTF8::is_utf16(file_get_contents('utf-16-be.txt')); // 2 UTF8::is_utf16(file_get_contents('utf-8.txt')); // false ``` [](/#is_utf32string-str--intfalse) is_utf32(string $str) : int|false Check if the string is UTF-32: This function will return false if is't not UTF-32, 1 for UTF-32LE, 2 for UTF-32BE. alias: UTF8::isUtf16() ```php UTF8::is_utf32(file_get_contents('utf-32-le.txt')); // 1 UTF8::is_utf32(file_get_contents('utf-32-be.txt')); // 2 UTF8::is_utf32(file_get_contents('utf-8.txt')); // false ``` [](/#is_utf8string-str-bool-strict--false--bool) is_utf8(string $str, bool $strict = false) : bool Checks whether the passed string contains only byte sequences that appear valid UTF-8 characters. alias: UTF8::isUtf8() ```php UTF8::is_utf8('Iñtërnâtiônàlizætiøn'); // true UTF8::is_utf8("Iñtërnâtiônàlizætiøn\xA0\xA1"); // false ``` [](/#json_decodestring-json-bool-assoc--false-int-depth--512-int-options--0--mixed) json_decode(string $json, bool $assoc = false, int $depth = 512, int $options = 0) : mixed Decodes a JSON string. ```php UTF8::json_decode('[1,"\u00a5","\u00e4"]'); // array(1, '¥', 'ä') ``` [](/#json_encodemixed-value-int-options--0-int-depth--512--string) json_encode(mixed $value, int $options = 0, int $depth = 512) : string Returns the JSON representation of a value. ```php UTF8::json_enocde(array(1, '¥', 'ä')); // '[1,"\u00a5","\u00e4"]' ``` [](/#lcfirststring-str--string) lcfirst(string $str) : string Makes string's first char lowercase. ```php UTF8::lcfirst('ÑTËRNÂTIÔNÀLIZÆTIØN'); // ñTËRNÂTIÔNÀLIZÆTIØN ``` [](/#maxmixed-arg--string) max(mixed $arg) : string Returns the UTF-8 character with the maximum code point in the given data. ```php UTF8::max('abc-äöü-中文空白'); // 'ø' ``` [](/#max_chr_widthstring-str--int) max_chr_width(string $str) : int Calculates and returns the maximum number of bytes taken by any UTF-8 encoded character in the given string. ```php UTF8::max_chr_width('Intërnâtiônàlizætiøn'); // 2 ``` [](/#minmixed-arg--string) min(mixed $arg) : string Returns the UTF-8 character with the minimum code point in the given data. ```php UTF8::min('abc-äöü-中文空白'); // '-' ``` [](/#normalize_encodingstring-encoding--string) normalize_encoding(string $encoding) : string Normalize the encoding-"name" input. ```php UTF8::normalize_encoding('UTF8'); // 'UTF-8' ``` [](/#normalize_mswordstring-str--string) normalize_msword(string $str) : string Normalize some MS Word special characters. ```php UTF8::normalize_msword('„Abcdef…”'); // '"Abcdef..."' ``` [](/#normalize_whitespacestring-str-bool-keepnonbreakingspace--false-bool-keepbidiunicodecontrols--false--string) normalize_whitespace(string $str, bool $keepNonBreakingSpace = false, bool $keepBidiUnicodeControls = false) : string Normalize the whitespace. ```php UTF8::normalize_whitespace("abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC", true); // "abc-\xc2\xa0-öäü- -" ``` [](/#ordstring-chr--int) ord(string $chr) : int Calculates Unicode code point of the given UTF-8 encoded character. INFO: opposite to UTF8::chr() ```php UTF8::ord('中'); // 20013 ``` [](/#parse_strstring-str-result--bool) parse_str(string $str, &$result) : bool Parses the string into an array (into the the second parameter). WARNING: Instead of "parse_str()" this method do not (re-)placing variables in the current scope, if the second parameter is not set! ```php UTF8::parse_str('Iñtërnâtiônéàlizætiøn=測試&arr[]=foo+測試&arr[]=ການທົດສອບ', $array); echo $array['Iñtërnâtiônéàlizætiøn']; // '測試' ``` [](/#rangemixed-var1-mixed-var2--array) range(mixed $var1, mixed $var2) : array Create an array containing a range of UTF-8 characters. ```php UTF8::range('κ', 'ζ'); // array('κ', 'ι', 'θ', 'η', 'ζ',) ``` [](/#remove_bomstring-str--string) remove_bom(string $str) : string Remove the BOM from UTF-8 / UTF-16 / UTF-32 strings. ```php UTF8::remove_bom("\xEF\xBB\xBFΜπορώ να"); // 'Μπορώ να' ``` [](/#remove_duplicatesstring-str-stringarray-what-----string) remove_duplicates(string $str, string|array $what = ' ') : string Removes duplicate occurrences of a string in another string. ```php UTF8::remove_duplicates('öäü-κόσμεκόσμε-äöü', 'κόσμε'); // 'öäü-κόσμε-äöü' ``` [](/#remove_invisible_charactersstring-str-bool-url_encoded--true-string-replacement----string) remove_invisible_characters(string $str, bool $url_encoded = true, string $replacement = '') : string Remove invisible characters from a string. ```php UTF8::remove_duplicates("κόσ\0με"); // 'κόσμε' ``` [](/#replace_diamond_question_markstring-str-string-unknown----string) replace_diamond_question_mark(string $str, string $unknown = '?') : string Replace the diamond question mark () with the replacement. ```php UTF8::replace_diamond_question_mark('中文空白'); // '中文空白' ``` [](/#trimstring-str---string-chars--inf--string) trim(string $str = '', string $chars = INF) : string Strip whitespace or other characters from beginning or end of a UTF-8 string. ```php UTF8::rtrim(' -ABC-中文空白- '); // '-ABC-中文空白-' ``` [](/#rtrimstring-str---string-chars--inf--string) rtrim(string $str = '', string $chars = INF) : string Strip whitespace or other characters from end of a UTF-8 string. ```php UTF8::rtrim('-ABC-中文空白- '); // '-ABC-中文空白-' ``` [](/#ltrimstring-str-string-chars--inf--string) ltrim(string $str, string $chars = INF) : string Strip whitespace or other characters from beginning of a UTF-8 string. ```php UTF8::ltrim(' 中文空白 '); // '中文空白 ' ``` [](/#single_chr_html_encodestring-char-bool-keepasciichars--false--string) single_chr_html_encode(string $char, bool $keepAsciiChars = false) : string Converts a UTF-8 character to HTML Numbered Entity like "{". ```php UTF8::single_chr_html_encode('κ'); // 'κ' ``` [](/#splitstring-str-int-length--1-bool-cleanutf8--false--array) split(string $str, int $length = 1, bool $cleanUtf8 = false) : array Convert a string to an array of Unicode characters. ```php UTF8::split('中文空白'); // array('中', '文', '空', '白') ``` [](/#str_detect_encodingstring-str--string) str_detect_encoding(string $str) : string Optimized "\mb_detect_encoding()"-function -> with support for UTF-16 and UTF-32. ```php UTF8::str_detect_encoding('中文空白'); // 'UTF-8' UTF8::str_detect_encoding('Abc'); // 'ASCII' ``` [](/#str_ireplacemixed-search-mixed-replace-mixed-subject-int-count--null--mixed) str_ireplace(mixed $search, mixed $replace, mixed $subject, int &$count = null) : mixed Case-insensitive and UTF-8 safe version of str_replace. ```php UTF8::str_ireplace('lIzÆ', 'lise', array('Iñtërnâtiônàlizætiøn')); // array('Iñtërnâtiônàlisetiøn') ``` [](/#str_limit_after_wordstring-str-int-length--100-stirng-straddon----string) str_limit_after_word(string $str, int $length = 100, stirng $strAddOn = '...') : string Limit the number of characters in a string, but also after the next word. ```php UTF8::str_limit_after_word('fòô bàř fòô', 8, ''); // 'fòô bàř' ``` [](/#str_padstring-str-int-pad_length-string-pad_string----int-pad_type--str_pad_right--string) str_pad(string $str, int $pad_length, string $pad_string = ' ', int $pad_type = STR_PAD_RIGHT) : string Pad a UTF-8 string to given length with another string. ```php UTF8::str_pad('中文空白', 10, '_', STR_PAD_BOTH); // '___中文空白___' ``` [](/#str_padstring-str-int-pad_length-string-pad_string----int-pad_type--str_pad_right--string-1) str_pad(string $str, int $pad_length, string $pad_string = ' ', int $pad_type = STR_PAD_RIGHT) : string Pad a UTF-8 string to given length with another string. ```php UTF8::str_pad('中文空白', 10, '_', STR_PAD_BOTH); // '___中文空白___' ``` [](/#str_repeatstring-str-int-multiplier--string) str_repeat(string $str, int $multiplier) : string Repeat a string. ```php UTF8::str_repeat("°~\xf0\x90\x28\xbc", 2); // '°~ð(¼°~ð(¼' ``` [](/#str_shufflestring-str--string) str_shuffle(string $str) : string Shuffles all the characters in the string. ```php UTF8::str_shuffle('fòô bàř fòô'); // 'àòôřb ffòô ' ``` [](/#str_sortstring-str-bool-unique--false-bool-desc--false--string) str_sort(string $str, bool $unique = false, bool $desc = false) : string Sort all characters according to code points. ```php UTF8::str_sort(' -ABC-中文空白- '); // ' ---ABC中文白空' ``` [](/#str_splitstring-str-int-len--1--array) str_split(string $str, int $len = 1) : array Split a string into an array. ```php UTF8::split('déjà', 2); // array('dé', 'jà') ``` [](/#str_to_binarystring-str--string) str_to_binary(string $str) : string Get a binary representation of a specific string. INFO: opposite to UTF8::binary_to_str() ```php UTF8::str_to_binary('😃'); // '11110000100111111001100010000011' ``` [](/#str_word_countstring-str-int-format--0-string-charlist----string) str_word_count(string $str, int $format = 0, string $charlist = '') : string Get a binary representation of a specific string. ```php // format: 0 -> return only word count (int) // UTF8::str_word_count('中文空白 öäü abc#c'); // 4 UTF8::str_word_count('中文空白 öäü abc#c', 0, '#'); // 3 // format: 1 -> return words (array) // UTF8::str_word_count('中文空白 öäü abc#c', 1); // array('中文空白', 'öäü', 'abc', 'c') UTF8::str_word_count('中文空白 öäü abc#c', 1, '#'); // array('中文空白', 'öäü', 'abc#c') // format: 2 -> return words with offset (array) // UTF8::str_word_count('中文空白 öäü ab#c', 2); // array(0 => '中文空白', 5 => 'öäü', 9 => 'abc', 13 => 'c') UTF8::str_word_count('中文空白 öäü ab#c', 2, '#'); // array(0 => '中文空白', 5 => 'öäü', 9 => 'abc#c') ``` [](/#strcmpstring-str1-string-str2--int) strcmp(string $str1, string $str2) : int Case-insensitive string comparison: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal. ```php UTF8::strcmp("iñtërnâtiôn\nàlizætiøn", "iñtërnâtiôn\nàlizætiøn"); // 0 ``` [](/#strnatcmpstring-str1-string-str2--int) strnatcmp(string $str1, string $str2) : int Case sensitive string comparisons using a "natural order" algorithm: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal. INFO: natural order version of UTF8::strcmp() ```php UTF8::strnatcmp('2Hello world 中文空白!', '10Hello WORLD 中文空白!'); // -1 UTF8::strcmp('2Hello world 中文空白!', '10Hello WORLD 中文空白!'); // 1 UTF8::strnatcmp('10Hello world 中文空白!', '2Hello WORLD 中文空白!'); // 1 UTF8::strcmp('10Hello world 中文空白!', '2Hello WORLD 中文空白!')); // -1 ``` [](/#strcasecmpstring-str1-string-str2--int) strcasecmp(string $str1, string $str2) : int Case-insensitive string comparison: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal. INFO: Case-insensitive version of UTF8::strcmp() ```php UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn", "Iñtërnâtiôn\nàlizætiøn"); // 0 ``` [](/#strnatcasecmpstring-str1-string-str2--int) strnatcasecmp(string $str1, string $str2) : int Case insensitive string comparisons using a "natural order" algorithm: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal. INFO: natural order version of UTF8::strcasecmp() ```php UTF8::strnatcasecmp('2', '10Hello WORLD 中文空白!'); // -1 UTF8::strcasecmp('2Hello world 中文空白!', '10Hello WORLD 中文空白!'); // 1 UTF8::strnatcasecmp('10Hello world 中文空白!', '2Hello WORLD 中文空白!'); // 1 UTF8::strcasecmp('10Hello world 中文空白!', '2Hello WORLD 中文空白!'); // -1 ``` [](/#strncasecmpstring-str1-string-str2-int-len--int) strncasecmp(string $str1, string $str2, int $len) : int Case-insensitive string comparison of the first n characters.: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal. INFO: Case-insensitive version of UTF8::strncmp() ```php UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn321", "iñtërnâtiôn\nàlizætiøn123", 5); // 0 ``` [](/#strncasecmpstring-str1-string-str2-int-len--int-1) strncasecmp(string $str1, string $str2, int $len) : int Case-insensitive string comparison of the first n characters.: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal. INFO: Case-insensitive version of UTF8::strncmp() ```php UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn321", "Iñtërnâtiôn\nàlizætiøn123", 5); // 0 ``` [](/#strncmpstring-str1-string-str2-int-len--int) strncmp(string $str1, string $str2, int $len) : int Case-sensitive string comparison of the first n characters.: < 0 if str1 is less than str2; > 0 if str1 is greater than str2, 0 if they are equal. ```php UTF8::strncmp("Iñtërnâtiôn\nàlizætiøn321", "Iñtërnâtiôn\nàlizætiøn123", 5); // 0 ``` [](/#stringstring-str1-string-str2--int) string(string $str1, string $str2) : int Create a UTF-8 string from code points. INFO: opposite to UTF8::codepoints() ```php UTF8::string(array(246, 228, 252)); // 'öäü' ``` [](/#string_has_bomstring-str--bool) string_has_bom(string $str) : bool Checks if string starts with "BOM" (Byte Order Mark Character) character. alias: UTF8::hasBom() ```php UTF8::string_has_bom("\xef\xbb\xbf foobar"); // true ``` [](/#strip_tagsstring-str-stingnull-allowable_tags--null--string) strip_tags(string $str, sting|null $allowable_tags = null) : string Strip HTML and PHP tags from a string + clean invalid UTF-8. ```php UTF8::strip_tags("<span>κόσμε\xa0\xa1</span>"); // 'κόσμε' ``` [](/#strlenstring-str-string-encoding--utf-8-bool-cleanutf8--false--int) strlen(string $str, string $encoding = 'UTF-8', bool $cleanUtf8 = false) : int Get the string length, not the byte-length! ```php UTF8::strlen("Iñtërnâtiôn\xE9àlizætiøn")); // 20 ``` [](/#strwidthstring-str-string-encoding--utf-8-bool-cleanutf8--false--int) strwidth(string $str, string $encoding = 'UTF-8', bool $cleanUtf8 = false) : int Return the width of a string. ```php UTF8::strwidth("Iñtërnâtiôn\xE9àlizætiøn")); // 21 ``` [](/#strpbrkstring-haystack-string-char_list--string) strpbrk(string $haystack, string $char_list) : string Search a string for any of a set of characters. ```php UTF8::strpbrk('-中文空白-', '白'); // '白-' ``` [](/#strposstring-haystack-string-char_list--intfalse) strpos(string $haystack, string $char_list) : int|false Find position of first occurrence of string in a string. ```php UTF8::strpos('ABC-ÖÄÜ-中文空白-中文空白', '中'); // 8 ``` [](/#striposstr-needle-before_needle--false--intfalse) stripos($str, $needle, $before_needle = false) : int|false Finds position of first occurrence of a string within another, case insensitive. ```php UTF8::strpos('ABC-ÖÄÜ-中文空白-中文空白', '中'); // 8 ``` [](/#strrposstring-haystack-string-needle-int-offset--0-bool-cleanutf8--false--stringfalse) strrpos(string $haystack, string $needle, int $offset = 0, bool $cleanUtf8 = false) : string|false Find position of last occurrence of a string in a string. ```php UTF8::strrpos('ABC-ÖÄÜ-中文空白-中文空白', '中'); // 13 ``` [](/#strriposstring-haystack-string-needle-int-offset--0-bool-cleanutf8--false--stringfalse) strripos(string $haystack, string $needle, int $offset = 0, bool $cleanUtf8 = false) : string|false Find position of last occurrence of a case-insensitive string. ```php UTF8::strripos('ABC-ÖÄÜ-中文空白-中文空白', '中'); // 13 ``` [](/#strrchrstring-haystack-string-needle-bool-part--false-string-encoding--stringfalse) strrchr(string $haystack, string $needle, bool $part = false, string $encoding) : string|false Finds the last occurrence of a character in a string within another. ```php UTF8::strrchr('κόσμεκόσμε-äöü', 'κόσμε'); // 'κόσμε-äöü' ``` [](/#strrichrstring-haystack-string-needle-bool-part--false-string-encoding--stringfalse) strrichr(string $haystack, string $needle, bool $part = false, string $encoding) : string|false Finds the last occurrence of a character in a string within another, case insensitive. ```php UTF8::strrichr('Aκόσμεκόσμε-äöü', 'aκόσμε'); // 'Aκόσμεκόσμε-äöü' ``` [](/#strrevstring-str--string) strrev(string $str) : string Reverses characters order in the string. ```php UTF8::strrev('κ-öäü'); // 'üäö-κ' ``` [](/#strspnstring-str-string-mask-int-offset--0-int-length--2147483647--string) strspn(string $str, string $mask, int $offset = 0, int $length = 2147483647) : string Finds the length of the initial segment of a string consisting entirely of characters contained within a given mask. ```php UTF8::strspn('iñtërnâtiônàlizætiøn', 'itñ'); // '3' ``` [](/#strstrstring-str-string-needle-bool-before_needle--false--string) strstr(string $str, string $needle, bool $before_needle = false) : string Returns part of haystack string from the first occurrence of needle to the end of haystack. ```php $str = 'iñtërnâtiônàlizætiøn'; $search = 'nât'; UTF8::strstr($str, $search)); // 'nâtiônàlizætiøn' UTF8::strstr($str, $search, true)); // 'iñtër' ``` [](/#stristrstring-str-string-needle-bool-before_needle--false--string) stristr(string $str, string $needle, bool $before_needle = false) : string Returns all of haystack starting from and including the first occurrence of needle to the end. ```php $str = 'iñtërnâtiônàlizætiøn'; $search = 'NÂT'; UTF8::stristr($str, $search)); // 'nâtiônàlizætiøn' UTF8::stristr($str, $search, true)); // 'iñtër' ``` [](/#strtocasefoldstring-str-bool-full--true--string) strtocasefold(string $str, bool $full = true) : string Unicode transformation for case-less matching. ```php UTF8::strtocasefold('ǰ◌̱'); // 'ǰ◌̱' ``` [](/#strtolowerstring-str-string-encoding--utf-8--string) strtolower(string $str, string $encoding = 'UTF-8') : string Make a string lowercase. ```php UTF8::strtolower('DÉJÀ Σσς Iıİi'); // 'déjà σσς iıii' ``` [](/#strtoupperstring-str-string-encoding--utf-8--string) strtoupper(string $str, string $encoding = 'UTF-8') : string Make a string uppercase. ```php UTF8::strtoupper('Déjà Σσς Iıİi'); // 'DÉJÀ ΣΣΣ IIİI' ``` [](/#strtrstring-str-stringarray-from-stringarray-to--inf--string) strtr(string $str, string|array $from, string|array $to = INF) : string Translate characters or replace sub-strings. ```php $arr = array( 'Hello' => '○●◎', '中文空白' => 'earth', ); UTF8::strtr('Hello 中文空白', $arr); // '○●◎ earth' ``` [](/#substrstring-str-int-start--0-int-length--null-string-encoding--utf-8-bool-cleanutf8--false--string) substr(string $str, int $start = 0, int $length = null, string $encoding = 'UTF-8', bool $cleanUtf8 = false) : string Get part of a string. ```php UTF8::substr('中文空白', 1, 2); // '文空' ``` [](/#substr_comparestring-main_str-string-str-int-offset-int-length--2147483647-bool-case_insensitivity--false--int) substr_compare(string $main_str, string $str, int $offset, int $length = 2147483647, bool $case_insensitivity = false) : int Binary safe comparison of two strings from an offset, up to length characters. ```php UTF8::substr_compare("○●◎\r", '●◎', 0, 2); // -1 UTF8::substr_compare("○●◎\r", '◎●', 1, 2); // 1 UTF8::substr_compare("○●◎\r", '●◎', 1, 2); // 0 ``` [](/#substr_countstring-haystack-string-needle-int-offset--0-int-length--null-string-encoding--utf-8--int) substr_count(string $haystack, string $needle, int $offset = 0, int $length = null, string $encoding = 'UTF-8') : int Count the number of substring occurrences. ```php UTF8::substr_count('中文空白', '文空', 1, 2); // 1 ``` [](/#substr_replacestringstring-str-stringstring-replacement-intint-start-intint-length--null--stringarray) substr_replace(string|string[] $str, string|string[] $replacement, int|int[] $start, int|int[] $length = null) : string|array Replace text within a portion of a string. ```php UTF8::substr_replace(array('Iñtërnâtiônàlizætiøn', 'foo'), 'æ', 1); // array('Iæñtërnâtiônàlizætiøn', 'fæoo') ``` [](/#swapcasestring-str-string-string-encoding--utf-8--string) swapCase(string $str, string string $encoding = 'UTF-8') : string Returns a case swapped version of the string. ```php UTF8::swapCase('déJÀ σσς iıII'); // 'DÉjà ΣΣΣ IIii' ``` [](/#swapcasestring-str-string-string-encoding--utf-8--string-1) swapCase(string $str, string string $encoding = 'UTF-8') : string Returns a case swapped version of the string. ```php UTF8::swapCase('déJÀ σσς iıII'); // 'DÉjà ΣΣΣ IIii' ``` [](/#to_asciistring-str-string-unknown----string) to_ascii(string $str, string $unknown = '?') : string Convert a string into ASCII. alias: UTF8::toAscii() ```php UTF8::to_ascii('déjà σσς iıii'); // 'deja sss iiii' ``` [](/#to_utf8stringstring-str--stringstring) to_utf8(string|string[] $str) : string|string[] This function leaves UTF8 characters alone, while converting almost all non-UTF8 to UTF8. alias: UTF8::toUtf8() ```php UTF8::to_utf8("\u0063\u0061\u0074"); // 'cat' ``` [](/#to_iso8859stringstring-str--stringstring) to_iso8859(string|string[] $str) : string|string[] Convert a string into "ISO-8859"-encoding (Latin-1). alias: UTF8::toIso8859() alias: UTF8::to_latin1() alias: UTF8::toLatin1() ```php UTF8::to_utf8(UTF8::to_latin1(' -ABC-中文空白- ')); // ' -ABC-????- ' ``` [](/#ucfirststring-str--string) ucfirst(string $str) : string Makes string's first char uppercase. alias: UTF8::ucword() ```php UTF8::ucfirst('ñtërnâtiônàlizætiøn'); // 'Ñtërnâtiônàlizætiøn' ``` [](/#ucwordsstring-str--string) ucwords(string $str) : string Uppercase for all words in the string. ```php UTF8::ucwords('iñt ërn âTi ônà liz æti øn'); // 'Iñt Ërn ÂTi Ônà Liz Æti Øn' ``` [](/#urldecodestring-str--string) urldecode(string $str) : string Multi decode html entity & fix urlencoded-win1252-chars. ```php UTF8::urldecode('tes%20öäü%20\u00edtest'); // 'tes öäü ítest' ``` [](/#utf8_decodestring-str--string) utf8_decode(string $str) : string Decodes an UTF-8 string to ISO-8859-1. ```php UTF8::encode('UTF-8', UTF8::utf8_decode('-ABC-中文空白-')); // '-ABC-????-' ``` [](/#utf8_encodestring-str--string) utf8_encode(string $str) : string Encodes an ISO-8859-1 string to UTF-8. ```php UTF8::utf8_decode(UTF8::utf8_encode('-ABC-中文空白-')); // '-ABC-中文空白-' ``` [](/#words_limitstring-str-int-words--100-string-straddon----string) words_limit(string $str, int $words = 100, string $strAddOn = '...') : string Limit the number of words in a string. ```php UTF8::words_limit('fòô bàř fòô', 2, ''); // 'fòô bàř' ``` [](/#wordwrapstring-str-int-width--75-string-break--n-bool-cut--false--string) wordwrap(string $str, int $width = 75, string $break = "\n", bool $cut = false) : string Wraps a string to a given number of characters ```php UTF8::wordwrap('Iñtërnâtiônàlizætiøn', 10, "\n", true)); // 'Iñ<br>të<br>rn<br>ât<br>iô<br>nà<br>li<br>zæ<br>ti<br>øn' ```