본문 바로가기

진리는어디에

Extension Writing Part II: Parameters, Arrays, and ZVALs

/**
 이번 장을 보기 전에 이전 장에서 공부한 내용에 대해서 간략하게 언급 해 보도록 하지요.
 우리는 PHP extension의 개념에 대해서 간략하게나마 알아보았고, 모듈의 라이프 싸이클에 대해서도 공부했습니다. 간단한 함수들을 직접 만들어 보았고, dynamic 변수와 static 변수를 리턴할 때 무엇에 신경을 써야 하는지 살펴 보았습니다. 그 외에도 다른 몇가지를 더 얇지만 넓게 살펴 보았습니다.

 이번 장에서는 PHP 스크립트로 부터 넘어오는 파라메터를 extension 모듈에서 어떻게 해석하고, PHP와 Zend Engine의 변수 관리 방법에 대해서도 알아 보도록 하겠습니다.
*/

Accepting Values
실질적으로 함수들을 위한 파라메터들은 일반적인 기대와는 달리 함수에 선언되지 않습니다. 대신 넘어오는 파라메터에 대한 참조(reference) 리스트가 넘어 올 뿐이죠. 그러면 함수는 ZE에게 그 리스트를 이용하여 쓸모 있는 무엇인가를 만들어 줄것을 요구합니다. 바로 '변수'죠.

자 새로운 함수를 하나 정의하는 것으로 시작을 해보도록 하겠습니다. 함수의 이름은 'hello_greetme()'입니다. 이 함수는 하나의 문자열 파라메터를 받고 그것을 간단한 인사말과 함께 출력합니다. 앞에서 했던대로 3곳에 새로운 코드를 추가 하도록 하겠습니다.

'php_hello.h'에는 다른 함수 원형 다음에 아래의 프로토타입을 선언합니다 :

PHP_FUNCTION(hello_greetme); 

'hello.c'에는 hello_functions 구조체 맨 밑에 아래와 같이 추가 합니다 :

    PHP_FE(hello_bool, NULL)
    PHP_FE(hello_null, NULL)
    PHP_FE(hello_greetme, NULL)
    {NULL, NULL, NULL}
}; 

그리고 'hello.c'의 맨 끝부분, 함수의 정의들이 모여 있는 부분, 이 부분에는 아래와 같이 함수의 몸체를 추가 해 줍니다 :

PHP_FUNCTION(hello_greetme)
{
    char *name;
    int name_len;
    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &name, &name_len) == FAILURE) {
        RETURN_NULL();
    }
    php_printf("Hello %s ", name);
    RETURN_TRUE;
}

'zend_parse_parameters()'를 살펴 보겠습니다. ZEND_NUM_ARGS()는 ZE에게 파라메터들에 대한 정보를 제공합니다. 'TSRMLS_CC'는 쓰레드 세이프하다는 것을 표시합니다. 그리고 이 함수는 'SUCCESS' 또는 'FAILURE' 둘 중에 하나를 리턴합니다. 일반적인 상황에서는 SUCCESS가 리턴되겠지만, 스크립트에서 지정된것 보다 더 많거나 적은 파라메터를 넘긴다거나, 적절한 데이터 타입으로 변환 될 수 없는 파라메터가 넘어오게 된다면, Zend는 자동적으로 에러메시지를 발생시키며 함수를 종료 시켜버릴것 입니다.

 위의 예제에서 'hello_greetme()' 함수는 'zend_parse_parameters()' 함수에 단 하나의 파라메터만이 넘어 올것이며 그 파라메터는 문자열이라고 지정하고 있습니다. 그리고 그 문자열 파라메터를 저장하기 위해 'char*' 타입의 변수의 참조(referece)를 넘기고 있습니다. 그리고 한 가지 더, 'zend_parse_parameters()'함수에 정수(integer)변수도 함께 넘겨지고 있다는 것에 주의 해 주세요. 이 변수를 통해서 ZE는 문자열의 길리를 리턴합니다.

 다음으로 할 일은 'zend_parse_parameters()'를 통해 넘겨 받은 문자열을 인사말에 섞어 출력 하는 것입니다. 일반적으로 우리에게 친숙한 'printf()'대신 'php_printf()'함수를 사용 했다는 것을 주의 깊게 봐주세요. 이 함수를 사용하는 것에는 몇 가지 이유가 있습니다. 첫째, PHP의 출력 버퍼링 메커니즘을 통해서 출력을 하게 됩니다. 이것은 기존의 데이터 버퍼링 뿐만아니라 gzip 압축 같은 추가적인 프로세스를 수행 한다는 의미 입니다. 둘째, CLI나 CGI에서의 대부분의 출력 stdout인 반면에, SAPIs는 대부분의 출력이 특정 파이프나 소켓입니다. 그렇기 때문에 단순히 'printf()'만을 사용한다는 것은 표준 출력 장치를 이용하지 않는 곳에서는 많은 정보를 잃어 버릴 수 있다는 것을 의미 하기도 합니다.

 마지막으로 리턴에 대해서 살펴 보도록 하지요. 위의 함수는 단순히 TRUE만을 리턴하고 있습니다. 하지만 여러분이 원하지 않는다면 함수에서 아무런 값도 리턴하지 않을 수 있습니다. 만일 리턴 값이 없다면 기본적으로 NULL을 리턴하도록 되어 있습니다만 별로 좋은 습관은 못 됩니다. 정말 아무것도 리턴 할 것이 없다면 그냥 모든 과정이 잘 처리 되었다라는 의미에서 TRUE를 리턴하도록 습관을 길러 봅시다.

'zend_parse_parameters()'함수는 선택 파라메터를 다루는데 사용 될 수 도 있습니다. 다음 예제를 보면, long, double(float) 그리고 boolean 선택 인자가 있습니다. 여기서 '선택 인자'라는 것은 파라메터를 직접 지정해도 되지만, 만일 아무런 지정 없이 그냥 넘어간다면 기본 값으로(아래의 예제에서는 'false')로 셋팅 되는 파라메터를 말합니다. '선택 인자'라는 것이 정확한 표현은 아니지만 당장 생각하는 적절한 표현이 없군요 :

function hello_add($a, $b, $return_long = false) {
    $sum = (int)$a + (float)$b;
    if ($return_long) {
        return intval($sum);
    } else {
        return floatval($sum);
    }
}

C 코드를 살펴 보도록 하겠습니다(아래에는 함수의 정의만 적어 놓았지만, php_hello.h에 함수 원형과, hello_function[]에 엔트리들을 추가 해야 하는 것을 잊지 마세요!) :

PHP_FUNCTION(hello_add)
{
    long a;
    double b;
    zend_bool return_long = 0;
    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ld|b", &a, &b, &return_long) == FAILURE) {
        RETURN_NULL();
    }
    if (return_long) {
        RETURN_LONG(a + b);
    } else {
        RETURN_DOUBLE(a + b);
    }

 위의 데이터 타입 스트링("ld|b"라고 되어 잇는 부분)을 살펴 보도록 하겠습니다. l은 long 타입의 변수를 말합니다, d는 double을 나타내는 것이고요. 다음 문자열은 파이프 입니다. 이것은 나머지 문자들은 '선택 인자'라는 것을 나타냅니다. 만일 선택 인자가 넘어 오지 않는다면 'zend_parse_parameters()'는 기본적으로 셋팅되어 있는 값을 사용합니다. 마지막 문자는 b입니다. 당연히 boolean을 나타내지요. 데이터 타입 스트링(data type string) 뒤에 따라 오는 a, b와 return_long 들은 앞에서 공부 한 바와 같이 'zend_parse_parameters()'를 통해 넘어오는 데이터들이 저장될 공간을 지정해 주는 것입니다.

Table 1은 여러 가지 타입들과 그에 대응하는 문자 코드를 나타내고 있습니다 :

Table 1: Types and letter codes used in zend_parse_parameters()
Type Code Variable Type
Boolean b zend_bool
Long l long
Double d double
String s char*, int
Resource r zval*
Array a zval*
Object o zval*
zval z zval*

맨 아래의 네 타입이 모두 'zval*'만을 리턴하는 것에 주목할 필요가 있습니다. zval은 실질적으로 PHP의 변수를 나타낼 수 있는 데이터 타입입니다. 이전 장에서 언급했던 세가지 복합 데이터형인 'Resource', 'Array', 'Object'들은 C에서 마땅히 표현 할 수 있는 자료구조가 없으므로, zval* 형태의 타입으로 전환되는 것입니다.

The ZVAL

The zval, and PHP userspace variables in general, will easily be the most difficult concepts you'll need to wrap your head around. They will also be the most vital. To begin with, let's look at the structure of a zval:

struct {
    union {
        long lval;
        double dval;
        struct {
            char *val;
            int len;
        } str;
        HashTable *ht;
        zend_object_value obj;
    } value;
    zend_uint refcount;
    zend_uchar type;
    zend_uchar is_ref;
} zval;

As you can see, every zval has three basic elements in common: type, is_ref, and refcount. is_ref and refcount will be covered later on in this tutorial; for now let's focus on type.

By now you should already be familiar with PHP's eight data types. They're the seven listed in Table 1, plus NULL, which despite (or perhaps because of) the fact that it literally is nothing, is a type unto its own. Given a particular zval, the type can be examined using one of three convenience macros: Z_TYPE(zval), Z_TYPE_P(zval*), or Z_TYPE_PP(zval**). The only functional difference between these three is the level of indirection expected in the variable passed into it. The convention of using _P and _PP is repeated in other macros, such as the *VAL macros you're about to look at.

The value of type determines which portion of the zval's value union will be set. The following piece of code demonstrates a scaled down version of var_dump():

PHP_FUNCTION(hello_dump)
{
    zval *uservar;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", uservar) == FAILURE) {
        RETURN_NULL();
    }

    switch (Z_TYPE_P(uservar)) {
        case IS_NULL:
            php_printf("NULL ");
            break;
        case IS_BOOL:
            php_printf("Boolean: %s ", Z_LVAL_P(uservar) ? "TRUE" : "FALSE");
            break;
        case IS_LONG:
            php_printf("Long: %ld ", Z_LVAL_P(uservar));
            break;
        case IS_DOUBLE:
            php_printf("Double: %f ", Z_DVAL_P(uservar));
            break;
        case IS_STRING:
            php_printf("String: ");
            PHPWRITE(Z_STRVAL_P(uservar), Z_STRLEN_P(uservar));
            php_printf(" ");
            break;
        case IS_RESOURCE:
            php_printf("Resource ");
            break;
        case IS_ARRAY:
            php_printf("Array ");
            break;
        case IS_OBJECT:
            php_printf("Object ");
            break;
        default:
            php_printf("Unknown ");
    }

    RETURN_TRUE;
}

As you can see, the Boolean data type shares the same internal element as the long data type. Just as with RETURN_BOOL(), which you used in Part One of this series, FALSE is represented by 0, while TRUE is represented by 1.

When you use zend_parse_parameters() to request a specific data type, such as string, the Zend Engine checks the type of the incoming variable. If it matches, Zend simply passes through the corresponding parts of the zval to the right data types. If it's of a different type, Zend converts it as is appropriate and/or possible, using its usual type-juggling rules.

Modify the hello_greetme() function you implemented earlier by separating it out into smaller pieces:

PHP_FUNCTION(hello_greetme)
{
    zval *zname;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zname) == FAILURE) {
        RETURN_NULL();
    }

    convert_to_string(zname);

    php_printf("Hello ");
    PHPWRITE(Z_STRVAL_P(zname), Z_STRLEN_P(zname));
    php_printf(" ");

    RETURN_TRUE;
}

This time, zend_parse_parameters() was told to simply retrieve a PHP variable (zval) regardless of type, then the function explicitly cast the variable as a string (similar to $zname = (string)$zname; ), then php_printf() was called using the STRing VALue of the zname structure. As you've probably guessed, other convert_to_*() functions exist for bool, long, and double.

Creating ZVALs

So far, the zvals you've worked with have been allocated by the Zend Engine and will be freed the same way. Sometimes, however, it's necessary to create your own zval. Consider the following block of code:

{
    zval *temp;

    ALLOC_INIT_ZVAL(temp);

    Z_TYPE_P(temp) = IS_LONG;
    Z_LVAL_P(temp) = 1234;

    zval_ptr_dtor(&temp);
}

ALLOC_INIT_ZVAL(), as its name implies, allocates memory for a zval* and initializes it as a new variable. Once that's done, the Z_*_P() macros can be used to set the type and value of this variable. zval_ptr_dtor() handles the dirty work of cleaning up the memory allocated for the variable.

The two Z_*_P() calls could have actually been reduced to a single statement:

ZVAL_LONG(temp, 1234);

Similar macros exist for the other types, and follow the same syntax as the RETURN_*() macros you saw in Part One of thise series. In fact the RETURN_*() macros are just thin wrappers for RETVAL_*() and, by extension, ZVAL_*(). The following five versions are all identical:

RETURN_LONG(42);

RETVAL_LONG(42);
return;

ZVAL_LONG(return_value, 42);
return;

Z_TYPE_P(return_value) = IS_LONG;
Z_LVAL_P(return_value) = 42;
return;

return_value->type = IS_LONG;
return_value->value.lval = 42;
return;

If you're sharp, you're thinking about the impact of how these macros are defined on the way they're used in functions like hello_long(). "Where does return_value come from and why isn't it being allocated with ALLOC_INIT_ZVAL()?", you might be wondering.

While it may be hidden from you in your day-to-day extension writing, return_value is actually a function parameter defined in the prototype of every PHP_FUNCTION() definition. The Zend Engine allocates memory for it and initializes it as NULL so that even if your function doesn't explicitly set it, a value will still be available to the calling program. When your internal function finishes executing, it passes that value to the calling program, or frees it if the calling program is written to ignore it.

Arrays

Since you've used PHP in the past, you've already recognized an array as a variable whose purpose is to carry around other variables. The way this is represented internally is through a structure known as a HashTable. When creating arrays to be returned to PHP, the simplest approach involves using one of the functions listed in Table 2.

Table 2: zval array creation functions
PHP Syntax C Syntax (arr is a zval*) Meaning
$arr = array(); array_init(arr); Initialize a new array
$arr[] = NULL; add_next_index_null(arr); Add a value of a given type to
a numerically indexed array
$arr[] = 42; add_next_index_long(arr, 42);
$arr[] = true; add_next_index_bool(arr, 1);
$arr[] = 3.14; add_next_index_double(arr, 3.14);
$arr[] = 'foo'; add_next_index_string(arr, "foo", 1);
$arr[] = $myvar; add_next_index_zval(arr, myvar);
$arr[0] = NULL; add_index_null(arr, 0); Add a value of a given type to
a specific index in an array
$arr[1] = 42; add_index_long(arr, 1, 42);
$arr[2] = true; add_index_bool(arr, 2, 1);
$arr[3] = 3.14; add_index_double(arr, 3, 3.14);
$arr[4] = 'foo'; add_index_string(arr, 4, "foo", 1);
$arr[5] = $myvar; add_index_zval(arr, 5, myvar);
$arr['abc'] = NULL; add_assoc_null(arr, "abc"); Add a value of a given type to
an associatively indexed array
$arr['def'] = 711; add_assoc_long(arr, "def", 711);
$arr['ghi'] = true; add_assoc_bool(arr, "ghi", 1);
$arr['jkl'] = 1.44; add_assoc_double(arr, "jkl", 1.44);
$arr['mno'] = 'baz'; add_assoc_string(arr, "mno", "baz", 1);
$arr['pqr'] = $myvar; add_assoc_zval(arr, "pqr", myvar);

As with the RETURN_STRING() macro, the add_*_string() functions take a 1 or a 0 in the final parameter to indicate whether the string contents should be copied. They also have a kissing cousin in the form of an add_*_stringl() variant for each. The l indicates that the length of the string will be explicitly provided (rather than having the Zend Engine determine this with a call to strval(), which is binary-unsafe).

Using this binary-safe form is as simple as specifying the length just before the duplication parameter, like so:

add_assoc_stringl(arr, "someStringVar", "baz", 3, 1);

Using the add_assoc_*() functions, all array keys are assumed to contain no NULLs - the add_assoc_*() functions themselves are not binary-safe with respect to keys. Using keys with NULLs in them is discouraged (as it is already a technique used with protected and private object properties), but if doing so is necessary, you'll learn how you can do it soon enough, when we get into the zend_hash_*() functions later.

To put what you've just learned into practice, create the following function to return an array of values to the calling program. Be sure to add entries to php_hello.h and hello_functions[] so as to properly declare this function.

PHP_FUNCTION(hello_array)
{
    char *mystr;
    zval *mysubarray;

    array_init(return_value);

    add_index_long(return_value, 42, 123);

    add_next_index_string(return_value, "I should now be found at index 43", 1);

    add_next_index_stringl(return_value, "I'm at 44!", 10, 1);

    mystr = estrdup("Forty Five");
    add_next_index_string(return_value, mystr, 0);

    add_assoc_double(return_value, "pi", 3.1415926535);

    ALLOC_INIT_ZVAL(mysubarray);
    array_init(mysubarray);
    add_next_index_string(mysubarray, "hello", 1);
    add_assoc_zval(return_value, "subarray", mysubarray);    
}

Building this extension and issuing var_dump(hello_array()); gives:

array(6) {
  [42]=>
  int(123)
  [43]=>
  string(33) "I should now be found at index 43"
  [44]=>
  string(10) "I'm at 44!"
  [45]=>
  string(10) "Forty Five"
  ["pi"]=>
  float(3.1415926535)
  ["subarray"]=>
  array(1) {
    [0]=>
    string(5) "hello"
  }
}

Reading values back out of arrays means extracting them as zval**s directly from a HashTable using the zend_hash family of functions from the ZENDAPI. Let's start with a simple function which accepts one array as a parameter:

function hello_array_strings($arr) {

    if (!
is_array($arr)) return NULL;

   
printf("The array passed contains %d elements ", count($arr));

    foreach(
$arr as $data) {
        if (
is_string($data)) echo "$data ";
    }
}

Or, in C:

PHP_FUNCTION(hello_array_strings)
{
    zval *arr, **data;
    HashTable *arr_hash;
    HashPosition pointer;
    int array_count;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "a", &arr) == FAILURE) {
        RETURN_NULL();
    }

    arr_hash = Z_ARRVAL_P(arr);
    array_count = zend_hash_num_elements(arr_hash);

    php_printf("The array passed contains %d elements ", array_count);

    for(zend_hash_internal_pointer_reset_ex(arr_hash, &pointer); zend_hash_get_current_data_ex(arr_hash, (void**) &data, &pointer) == SUCCESS; zend_hash_move_forward_ex(arr_hash, &pointer)) {

        if (Z_TYPE_PP(data) == IS_STRING) {
            PHPWRITE(Z_STRVAL_PP(data), Z_STRLEN_PP(data));
            php_printf(" ");
        }
    }
    RETURN_TRUE;
}

In this function only those array elements which are of type string are output, in order to keep the function brief. You may be wondering why we didn't just use convert_to_string() as we did in the hello_greetme() function earlier. Let's give that a shot; replace the for loop above with the following:

    for(zend_hash_internal_pointer_reset_ex(arr_hash, &pointer); zend_hash_get_current_data_ex(arr_hash, (void**) &data, &pointer) == SUCCESS; zend_hash_move_forward_ex(arr_hash, &pointer)) {

        convert_to_string_ex(data);
        PHPWRITE(Z_STRVAL_PP(data), Z_STRLEN_PP(data));
        php_printf(" ");
    }

Now compile your extension again and run the following userspace code through it:

<?php

$a
= array('foo',123);
var_dump($a);
hello_array_strings($a);
var_dump($a);

?>

Notice that the original array was changed! Remember, the convert_to_*() functions have the same effect as calling set_type(). Since you're working with the same array that was passed in, changing its type here will change the original variable. In order to avoid this, you need to first make a copy of the zval. To do this, change that for loop again to the following:

    for(zend_hash_internal_pointer_reset_ex(arr_hash, &pointer); zend_hash_get_current_data_ex(arr_hash, (void**) &data, &pointer) == SUCCESS; zend_hash_move_forward_ex(arr_hash, &pointer)) {

        zval temp;

        temp = **data;
        zval_copy_ctor(&temp);
        convert_to_string(&temp);
        PHPWRITE(Z_STRVAL(temp), Z_STRLEN(temp));
        php_printf(" ");
        zval_dtor(&temp);
    }

The more obvious part of this version - temp = **data - just copies the data members of the original zval, but since a zval may contain additional resource allocations like char* strings, or HashTable* arrays, the dependent resources need to be duplicated with zval_copy_ctor(). From there it's just an ordinary convert, print, and a final zval_dtor() to get rid of the resources used by the copy.

If you're wondering why you didn't do a zval_copy_ctor() when we first introduced convert_to_string(), it's because the act of passing a variable into a function automatically performs a copy separating the zval from the original variable. This is only done on the base zval through, so any subordinate resources (such as array elements and object properties) still need to be separated before use.

Now that you've seen array values, let's extend the exercise a bit by looking at the keys as well:

for(zend_hash_internal_pointer_reset_ex(arr_hash, &pointer); zend_hash_get_current_data_ex(arr_hash, (void**) &data, &pointer) == SUCCESS; zend_hash_move_forward_ex(arr_hash, &pointer)) {

    zval temp;
    char *key;
    int key_len;
    long index;

    if (zend_hash_get_current_key_ex(arr_hash, &key, &key_len, &index, 0, &pointer) == HASH_KEY_IS_STRING) {
        PHPWRITE(key, key_len);
    } else {
        php_printf("%ld", index);
    }

    php_printf(" => ");

    temp = **data;
    zval_copy_ctor(&temp);
    convert_to_string(&temp);
    PHPWRITE(Z_STRVAL(temp), Z_STRLEN(temp));
    php_printf(" ");
    zval_dtor(&temp);
}

Remember that arrays can have numeric indexes, associative string keys, or both. Calling zend_hash_get_current_key_ex() makes it possible to fetch either type from the current position in the array, and determine its type based on the return values, which may be any of HASH_KEY_IS_STRING, HASH_KEY_IS_LONG, or HASH_KEY_NON_EXISTANT. Since zend_hash_get_current_data_ex() was able to return a zval**, you can safely assume that HASH_KEY_NON_EXISTANT will not be returned, so only the IS_STRING and IS_LONG possibilities need to be checked.

There's another way to iterate through a HashTable. The Zend Engine exposes three very similar functions to accommodate this task: zend_hash_apply(), zend_hash_apply_with_argument(), and zend_hash_apply_with_arguments(). The first form just loops through a HashTable, the second form allows a single argument to be passed through as a void*, while the third form allows an unlimited number of arguments via a vararg list. hello_array_walk() shows each of these in action:

static int php_hello_array_walk(zval **element TSRMLS_DC)
{
    zval temp;

    temp = **element;
    zval_copy_ctor(&temp);
    convert_to_string(&temp);
    PHPWRITE(Z_STRVAL(temp), Z_STRLEN(temp));
    php_printf(" ");
    zval_dtor(&temp);

    return ZEND_HASH_APPLY_KEEP;
}

static int php_hello_array_walk_arg(zval **element, char *greeting TSRMLS_DC)
{
    php_printf("%s", greeting);
    php_hello_array_walk(element TSRMLS_CC);

    return ZEND_HASH_APPLY_KEEP;
}

static int php_hello_array_walk_args(zval **element, int num_args, var_list args, zend_hash_key *hash_key)
{
    char *prefix = va_arg(args, char*);
    char *suffix = va_arg(args, char*);
    TSRMLS_FETCH();

    php_printf("%s", prefix);
    php_hello_array_walk(element TSRMLS_CC);
    php_printf("%s ", suffix);

    return ZEND_HASH_APPLY_KEEP;
}

PHP_FUNCTION(hello_array_walk)
{
    zval *zarray;
    int print_newline = 1;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "a", &zarray) == FAILURE) {
        RETURN_NULL();
    }

    zend_hash_apply(Z_ARRVAL_P(zarray), (apply_func_t)php_hello_array_walk TSRMLS_CC);
    zend_hash_apply_with_argument(Z_ARRVAL_P(zarray), (apply_func_arg_t)php_hello_array_walk_arg, "Hello " TSRMLS_CC);
    zend_hash_apply_with_arguments(Z_ARRVAL_P(zarray), (apply_func_args_t)php_hello_array_walk_args, 2, "Hello ", "Welcome to my extension!");

    RETURN_TRUE;
}

By now you should be familiar enough with the usage of the functions involved that most of the above code will be obvious. The array passed to hello_array_walk() is looped through three times, once with no arguments, once with a single argument, and a third time with two arguments. In this design, the walk_arg() and walk_args() functions actually rely on the no-argument walk() function to do the job of converting and printing the zval, since the job is common across all three.

In this block of code, as in most places where you'll use zend_hash_apply(), the apply() functions return ZEND_HASH_APPLY_KEEP. This tells the zend_hash_apply() function to leave the element in the HashTable and continue on with the next one. Other values which can be returned here are: ZEND_HASH_APPLY_REMOVE, which does just what it says - removes the current element and continues applying at the next - and ZEND_HASH_APPLY_STOP, which will halt the array walk at the current element and exit the zend_hash_apply() function completely.

The less familiar component in all this is probably TSRMLS_FETCH(). As you may recall from Part One, the TSRMLS_* macros are part of the Thread Safe Resource Management layer, and are necessary to keep one thread from trampling on another. Because the multi-argument version of zend_hash_apply() uses a vararg list, the tsrm_ls marker doesn't wind up getting passed into the walk() function. In order to recover it for use when we call back into php_hello_array_walk(), your function calls TSRMLS_FETCH() which performs a lookup to find the correct thread in the resource pool. (Note: This method is substantially slower than passing the argument directly, so use it only when unavoidable.)

Iterating through an array using this foreach-style approach is a common task, but often you'll be looking for a specific value in an array by index number or by associative key. This next function will return a value from an array passed in the first parameter based on the offset or key specified in the second parameter.

PHP_FUNCTION(hello_array_value)
{
    zval *zarray, *zoffset, **zvalue;
    long index = 0;
    char *key = NULL;
    int key_len = 0;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "az", &zarray, &zoffset) == FAILURE) {
        RETURN_NULL();
    }

    switch (Z_TYPE_P(zoffset)) {
        case IS_NULL:
            index = 0;
            break;
        case IS_DOUBLE:
           index = (long)Z_DVAL_P(zoffset);
            break;
        case IS_BOOL:
        case IS_LONG:
        case IS_RESOURCE:
            index = Z_LVAL_P(zoffset);
            break;
        case IS_STRING:
            key = Z_STRVAL_P(zoffset);
            key_len = Z_STRLEN_P(zoffset);
            break;
        case IS_ARRAY:
            key = "Array";
            key_len = sizeof("Array") - 1;
            break;
        case IS_OBJECT:
            key = "Object";
            key_len = sizeof("Object") - 1;
            break;
        default:
            key = "Unknown";
            key_len = sizeof("Unknown") - 1;
    }

    if (key && zend_hash_find(Z_ARRVAL_P(zarray), key, key_len + 1, (void**)&zvalue) == FAILURE) {
        RETURN_NULL();
    } else if (!key && zend_hash_index_find(Z_ARRVAL_P(zarray), index, (void**)&zvalue) == FAILURE) {
        RETURN_NULL();
    }

    *return_value = **zvalue;
    zval_copy_ctor(return_value);
}

This function starts off with a switch block that treats type conversion in much the same way as the Zend Engine would. NULL is treated as 0, Booleans are treated as their corresponding 0 or 1 values, doubles are cast to longs (and truncated in the process) and resources are cast to their numerical value. The treatment of resource types is a hangover from PHP 3, when resources really were just numbers used in a lookup and not a unique type unto themselves.

Arrays and objects are simply treated as a string literal of "Array" or "Object", since no honest attempt at conversion would actually make sense. The final default condition is put in as an ultra-careful catchall just in case this extension gets compiled against a future version of PHP, which may have additional data types.

Since key is only set to non-NULL if the function is looking for an associative key, it can use that value to decide whether it should use an associative or index based lookup. If the chosen lookup fails, it's because the key doesn't exist, and the function therefore returns NULL to indicate failure. Otherwise that zval is copied into return_value.

Symbol Tables as Arrays

If you've ever used the $GLOBALS array before, you know that every variable you declare and use in the global scope of a PHP script also appears in this array. Recalling that the internal representation of an array is a HashTable, one question comes to mind: "Is there some special place where the GLOBALS array can be found?" The answer is "Yes". It's in the Executor Globals structure as EG(symbol_table), which is of type HashTable (not HashTable*, mind you, just HashTable).

You already know how to find associatively keyed elements in an array, and now that you know where to find the global symbol table, it should be a cinch to look up variables from extension code:

PHP_FUNCTION(hello_get_global_var)
{
    char *varname;
    int varname_len;
    zval **varvalue;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &varname, &varname_len) == FAILURE) {
        RETURN_NULL();
    }

    if (zend_hash_find(&EG(symbol_table), varname, varname_len + 1, (void**)&varvalue) == FAILURE) {
        php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Undefined variable: %s", varname);
        RETURN_NULL();
    }

    *return_value = **varvalue;
    zval_copy_ctor(return_value);
}

This should all be intimately familiar to you by now. The function accepts a string parameter and uses that to find a variable in the global scope which it returns as a copy.

The one new item here is php_error_docref(). You'll find this function, or a near sibling thereof, throughout the PHP source tree. The first parameter is an alternate documentation reference (the current function is used by default). Next is the ubiquitous TSRMLS_CC, followed by a severity level for the error, and finally there's a printf() style format string and associated parameters for the actual text of the error message. It's important to always provide some kind of meaningful error whenever your function reaches a failure condition. In fact, now would be a good time to go back and add an error statement to hello_array_value(). The Sanity Check section at the end of this tutorial will include these as well.

In addition to the global symbol table, the Zend Engine also keeps track of a reference to the local symbol table. Since internal functions don't have symbol tables of their own (why would they need one after all?) the local symbol table actually refers to the local scope of the userland function that called the current internal function. Let's look at a simple function which sets a variable in the local scope:

PHP_FUNCTION(hello_set_local_var)
{
    zval *newvar;
    char *varname;
    int varname_len;
    zval *value;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "sz", &varname, &varname_len, &value) == FAILURE) {
        RETURN_NULL();
    }

    ALLOC_INIT_ZVAL(newvar);
    *newvar = *value;
    zval_copy_ctor(newvar);
    zend_hash_add(EG(active_symbol_table), varname, varname_len + 1, &newvar, sizeof(zval*), NULL);

    RETURN_TRUE;
}

Absolutely nothing new here. Go ahead and build what you've got so far and run some test scripts against it. Make sure that what you expect to happen, does happen.

Reference Counting

So far, the only zvals we've added to HashTables have been newly created or freshly copied ones. They've stood alone, occupying their own resources and living nowhere but that one HashTable. As a language design concept, this approach to creating and copying variables is "good enough", but since you're accustomed to programming in C, you know that it's not uncommon to save memory and CPU time by not copying a large block of data unless you absolutely have to. Consider this userspace block of code:

<?php

    $a
= file_get_contents('fourMegabyteLogFile.log');
   
$b = $a;
    unset(
$a);

?>

If $a were copied to $b by doing a zval_copy_ctor() (which performs an estrndup() on the string contents in the process) then this short script would actually use up eight megabytes of memory to store two identical copies of the four megabyte file. Unsetting $a in the final step only adds insult to injury, since the original string gets efree()d. Had this been done in C it would have been a simple matter of: b = a; a = NULL;

Fortunately, the Zend Engine is a bit smarter than that. When $a is first created, an underlying zval is made for it of type string, with the contents of the log file. That zval is assigned to the $a variable through a call to zend_hash_add(). When $a is copied to $b, however, the Engine does something similar to the following:

{
    zval **value;

    zend_hash_find(EG(active_symbol_table), "a", sizeof("a"), (void**)&value);

    ZVAL_ADDREF(*value);

    zend_hash_add(EG(active_symbol_table), "b", sizeof("b"), value, sizeof(zval*));
}

Of course, the real code is much more complex, but the important part to focus on here is ZVAL_ADDREF(). Remember that there are four principle elements in a zval. You've already seen type and value; this time you're working with refcount. As the name may imply, refcount is a counter of the number of times a particular zval is referenced in a symbol table, array, or elsewhere.

When you used ALLOC_INIT_ZVAL(), refcount this was set to 1 so you didn't have to do anything with it in order to return it or add it into a HashTable a single time. In the code block above, you've retrieved a zval from a HashTable, but not removed it, so it has a refcount which matches the number of places it's already referenced from. In order to reference it from another location, you need to increase its reference count.

When the userspace code calls unset($a), the Engine performs a zval_ptr_dtor() on the variable. What you didn't see, in prior uses of zval_ptr_dtor(), is that this call doesn't necessarily destroy the zval and all its contents. What it actually does is decrease its refcount. If, and only if, the refcount reaches zero, does the Zend Engine destroy the zval...

유익한 글이었다면 공감(❤) 버튼 꾹!! 추가 문의 사항은 댓글로!!