Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching mechanism #1

Open
enov opened this issue Sep 26, 2013 · 9 comments
Open

Caching mechanism #1

enov opened this issue Sep 26, 2013 · 9 comments

Comments

@enov
Copy link
Owner

enov commented Sep 26, 2013

I am opening the caching issue that was raised in Kohana Forum to redirect the discussion here, as I don't want to hijack the Kohana forum with Aurora discussions.


Aurora definitely needs a caching mechanism to speed up things.

Now, Jack Ellis, you have a nice wrapper around Aurora->load, which looks for the existing cache before calling it.

I would prefer to integrate better with Aurora:

  • Have a property in Aurora files for caching driver (like $config property for db)
  • Have another property for lifetime $cache_lifetime
  • Abstract away all caching logic to a separate class Aurora_Cache that basically might have save and load methods
  • Call those methods inside Aurora::load()

Also, the caching method you proposed will just cache models by keys. I would want for cache to work with any parameter passed through $params of Aurora::load() and cache anything whether its collection or model.

It's maybe better to serialize/unserialize, just the way you did, rather than use JSON.

On the other hand, Aurora needs caching on the json_encode/json_decode level, much more than on the load level. But what would be the cache key for storing and retreiving?

@JackEllis
Copy link

Nice.

Firstly, I've not profiled the speed, but I love the idea of storing models in a local array property, rather than just in the cache.

What I had in mind with the cache, and you can argue that this may be a foolish idea, was to cache a model indefinitely, until an update / delete function is called. Here is what I came up with, and am currently working with:

In this example, I changed around the key generation of the cache, so for example, in my load_by_username() function that I had, I did this:

<?php defined('SYSPATH') or die('No direct script access.');

class Data_User extends Data {

    public function load_by_username($username)
    {
        // Check to see if we have it in cache
        if ($model = $this->_check_cache('load_by_username', $username))
        {
            return $model;
        }

        // Retrieve model
        $model = Aurora::factory($this->_type)->by_username($username)->load()->single_result();

        // Add model to cache (and return it)
        return $this->_set_cache('load_by_username', $username, $model);
    }

}

As you can see, it passes through the load_by_username string as the method, and the identifier as the username. If you look at the code below, you can see that I then generated the cache key with the following:

$cache_key = '->'.$method.'('.$identifier.')';

I also had an 'external cache key prefix', which was simply a prefix in the format of [Data_MODEL]::

<?php defined('SYSPATH') or die('No direct script access.');

/**
 * The Data class is responsible for returning instances of Data_ classes (if they exist) or returning itself if they don't.
 * All Data classes will extend this
 */
class Data {

    private static $_instances = array();

    public static $_external_cache_instance = NULL;

    /**
     * Return a new instance or an existing one
     */
    public static function instance($type = NULL)
    {
        if ($type === NULL)
        {
            throw new Exception('Please specify the Data\'s type when calling Data::instance()');
        }

        if ( ! isset(self::$_instances[$type]))
        {
            $class_name = 'Data_'.ucfirst($type);

            if ( ! class_exists($class_name))
            {
                // Set the class name as this if the Data_ class doesn't exist
                $class_name = get_class();
            }

            // Create an instance of the Data class (and pass the type through)
            $instance = new $class_name($type);

            self::$_instances[$type] = $instance;
        }

        return self::$_instances[$type];
    }

    /**
     * @var  array  An array containing retrieved records since the object has been instantiated. Indexed by PK.
     */
    protected $_local_cache = array();

    /**
     * @var  string  Data  The cache key prefix for caching
     */
    protected $_external_cache_key_prefix = '[Data]::';

    /**
     * @var  boolean  false  Reset the data? (AKA, don't use any cache except for $_cache)
     */
    protected $_use_external_cache = TRUE;


    public function __construct($type = 'Data', $reset = FALSE)
    {
        // If the script requested a reset OR caching is off, reset all
        if ($reset === TRUE OR Kohana::$caching === FALSE)
        {
            $this->_use_external_cache = FALSE;
        }

        $this->_type = $type;

        // Change the external cache key based on the supplied type
        // eg. instance('user') becomes cache key [Data_User]::
        $this->_external_cache_key_prefix = str_replace('Data', 'Data_'.$type, $this->_external_cache_key_prefix);
    }

    protected function _clear_cache($method, $identifier)
    {
        // The cache key
        // e.g: _check_cache('load', 1) becomes ->load(1)
        $cache_key = '->'.$method.'('.$identifier.')';

        Cache::instance('file')->delete($this->_external_cache_key_prefix . $cache_key);
    }

    /**
     * Method to check to see if cache is stored locally or externally
     */
    protected function _check_cache($method, $identifier)
    {
        // The cache key
        // e.g: _check_cache('load', 1) becomes ->load(1)
        $cache_key = '->'.$method.'('.$identifier.')';

        // Check instances cache
        if ($model = Arr::get($this->_local_cache, $cache_key, FALSE))
        {
            // Return a model that has already been generated on this request
            return $model;
        }

        // Check that we can use external cache
        if ($this->_use_external_cache)
        {
            // External cache key
            $external_cache_key = $this->_external_cache_key_prefix . $cache_key;

            // Check external cache
            if (($model = Cache::instance('file')->get($external_cache_key)) !== NULL)
            {
                $this->_local_cache[$cache_key] = unserialize($model);

                return $this->_local_cache[$cache_key];
            }
        }

        return FALSE;
    }

    /**
     * Method to store the cached model
     */
    protected function _set_cache($method, $identifier, $model)
    {
        // The cache key
        // e.g: _check_cache('load', 1) becomes ->load(1)
        $cache_key = '->'.$method.'('.$identifier.')';

        // Add model to local cache
        $this->_local_cache[$cache_key] = $model;

        // If we are using external cache, store it
        if ($this->_use_external_cache)
        {
            Cache::instance('file')->set($this->_external_cache_key_prefix . $cache_key, serialize($model));
        }

        return $model;
    }

    /**
     * Load model by primary key
     */
    public function load($pk)
    {
        // Check to see if we have it in cache
        if ($model = $this->_check_cache('load', $pk))
        {
            return $model;
        }

        // Retrieve model
        $model = Aurora::factory($this->_type)->load($pk);

        // Add model to cache (and return it)
        return $this->_set_cache('load', $pk, $model);
    }

}

@enov
Copy link
Owner Author

enov commented Sep 26, 2013

Hi @JackEllis , nice to see you here!

I've not profiled the speed

Well, concerning speed, Aurora is pretty OK when loading, unless you're not messing up things in db_retrieve. And I am talking about loading a collection of thousands of models here, not a single model (which takes few milliseconds). Also, the PHP version used can be of great significance: in the unscientific tests I've done, using PHP 5.4 Aurora runs faster than when used under PHP 5.3 by at least 300%.

So you should be profiling, so that you know where you stand. You might want to enable the default Kohana profiler. Aurora includes built-in profiling marks.

cache a model indefinitely, until an update / delete function is called

Why this would be a foolish idea. However, if you have a foreign key constraint with ON DELETE CASCADE and ON UPDATE CASCADE, you're models would be affected by their relationships.


In general, I am still thinking whether it would be a good idea to cache models by pkey individually. My reasoning is that why I would want to cache models individually, if I can have them fresh from the database fast-enough? I would prefer to cache full collections (say of thousands of models), as loading collections (calling db_retrieve function thousands of times) might be worthy of caching.

@enov
Copy link
Owner Author

enov commented Sep 26, 2013

I love the idea of storing models in a local array property

Now I understand what you mean. So you might need to load the model in several places in a single http request? I bet that would be pretty fast. But could you elaborate more about your application?

@JackEllis
Copy link

Gotcha, well with the speed profiling comment, I was talking more on the local cache vs file cache (for speed) :P I'm using 5.4 thankfully :)

I think that caching collections is definitely needed too, but I also think model caching is always going to be faster than retrieving it from the database again. When we look at scaling too, if we've got some sort of memcached setup, it's going to be faster loading models from that than from the database each time.

I'm currently building a bodybuilding related web app.

Well, I'll give you an example of where I'd not 'non expiring' cache:
User Models - The user model isn't going to update on each page load / site visit, it will only update when an admin / the user modifies their account, and I'll force a cache refresh when that happens
User Measurement Models - The user's measurement models (one for each day) only change when they get updated, so again, cache would be faster and would chill out requests on the database.

And yep, that's exactly what I meant with the local array property. So I'm writing a forum area, and if a user posts 6 times on the same page, I'll have a loader user model (from cache / db) stored in my local array property, and it'll speed it up.

@enov
Copy link
Owner Author

enov commented Sep 26, 2013

I think that caching collections is definitely needed too, but I also think model caching is always going to be faster than retrieving it from the database again.

OK. I see. I will try to build a simple caching system, without changing Aurora API a lot. So you could still use Au::load() just the way you use, moreover, if you have the $cache_config and $cache_lifetime properties defined in your Aurora, the caching will work transparently on the background.

Now, the cache key will be the resource name - or cname in terms of Aurora - (ex: Feed_Item) + a serialized $params from the load() function. Probably we also need to hash the resulting key.

When we look at scaling too, if we've got some sort of memcached setup, it's going to be faster loading models from that than from the database each time.

You're right about memcache. It's definitely faster.


Now, what about the JSON representation of the model/collection. I believe that's more important, as the JSON is the end result and it's what you need to return to the browser. In that case, what would be the cache key?

  • If I have a model: a hash of the cname + pkey?
  • what about collection: a hash of cname + serialized array of pkey?

What do you think?

Thanks for the collaboration.

@enov
Copy link
Owner Author

enov commented Sep 26, 2013

I am just thinking aloud.

So there will be 2 kinds of caching. One for models/collections when loading. And another one for the JSON representation of the models/collections. Both will use the same cache config specified in Aurora.

The cache key for JSON will have 'JSON' prepended, in order to avoid collisions.

@JackEllis
Copy link

I like the idea of having a serialized $params in the cache key a lot, that's a good idea.

With the caching, would there be an automatic refresh of the cache when the user saves over data? Obviously with the collection caching, it would just be the model ids, rather than the model objects, because obviously updating a model nested in the collection (on model update) would be tricky.

I'm still going to have the Data:: wrapper, solely because I've never been a fan of using ORM:: or Aurora:: etc. directly in the app, I always prefer having a wrapper ;)

Why does the JSON representation need caching? Can't the model just be json_encoded() or encoded with a custom function (which I'm sure I've seen)?

@enov
Copy link
Owner Author

enov commented Sep 27, 2013

With the caching, would there be an automatic refresh of the cache when the user saves over data?

Well, I guess that's a valid concern. I think it's better if I slow down with the implementation now. I was planning a transparent caching mechanism, but only based on lifetime. Now I am thinking I need to give this enough time it needs for it to become mature in my mind, before I do a messy implementation.

I'm still going to have the Data:: wrapper, solely because I've never been a fan of using ORM:: or Aurora:: etc. directly in the app, I always prefer having a wrapper ;)

That decision goes to you, I see no harm and I respect that. I only hope that Aurora is useful.

Why does the JSON representation need caching? Can't the model just be json_encoded() or encoded with a custom function (which I'm sure I've seen)?

Ah, of course it needs caching, specially with Aurora's ability to invoke getters. See the Model_Category example here. Function calls (getters in this case) are expensive in PHP (much better in PHP 5.4+) and this is why implementing Interface_Aurora_JSON_Serialize would be a good idea if you find performance issues while you profile your application.

Again, no problem with a single model, as it will be json_encoded() in a blink.

Now, imagine that dragon of collection again, with thousands of models, and related-nested models that will eventually produce a deeply nested huge JSON. That needs some process time, and it definitely needs caching.

As an example, in my medical/dental application, where I need to chart and represent the 32 tooths and surfaces (+combination of surfaces) of those tooths and all kinds of treatments that can be applied on those surfaces of each tooth + their SVG representation on a SVG dental chart, I have ended up with a list of tables that are abstracted with related Model_Tooth, Model_Surface, Model_Treatments. The whole medical library is a huge JSON array. Moreover, I want load the whole library in the browser to have speedy charts afterwards, as additional HTTP requests slows down working with those charts. Also, as this library is pretty read-only, it only needs time based caching. To solve this I have used the Kohana Fragments on the view side, and I am also considering the use of the browser's local storage facility.

@JackEllis
Copy link

Fair enough. It's definitely got potential to play a huge part in Kohana development though, there is no existing general data mapper pattern module for Kohana, there's only ORM :(

Crikey, I definitely see the need to cache the json in that scenario! Kohana fragments look fantastic, and combined with the backend caching, we can make web apps so damn fast :)

A final thought, with the caching, you could always offer users a $_caching_type property, which they'd set as either:

  • An integer (which would set an expiry date)
  • 'update_on_save', which would update the cache whenever save() is called. Or the update function.

I don't know why, but I've never taken to the idea of having a lifetime for cache, and I can't decide whether it's because I'm being naive (and haven't run into many experiences where it's needed) or I am being rational. I can't think of a scenario where I wouldn't have the ability to update the cache when a model is updated... My User_Measurements_Model I have, whenever they get updated, I change the cache. Basically, I don't see a positive, in that case, for having a lifetime of the cache, but I see a positive (performance) for keeping the model cached indefinitely and only updating it when changes are made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants