

For sparsely executed functions, the ping architecture is better because PC is billed 24/7/365, whereas ping is billed per invoke (that being said: PC is a much cleaner solution). With the ping architecture you'd generally just invoke then immediately exit, so there's very little billed execution time. That being said, these ping functions can be cheap relative to PC. Its not perfect, but nothing on lambda is its possible that the ping request for User2 could get routed to a live function, which finishes, then starts handling a real request for User1, User2's real request comes in but this function is busy, so Lambda cold starts another one. The ping architecture of warming up functions does scale (better) in this setup.

That 11th request will still get a cold start. Its a tool in the toolbox, but the problem is: Let's say you set the provisioned concurrency at 10, then you have 11 concurrent requests come in. Provisioned concurrency is a bit of a non-starter for cold start latency reduction in user-facing applications. You can get a t4g.nano SPOT instance for about $3.50/month and you can keep that warm, but that is probably a whole lot more then you are paying for lambda. If you are really latency sensitive then lambda might not be the right choice for you. The "ping" technique you mentioned is one way to keep a function warm but if lambda decides to start a second instance of the function because the hot one is handling a request, then that person is going to take a warm up hit and nothing you can do about that. Where provisioned/reserved concurrency comes in useful is keeping a run-away lambda function from starving out other functions either by guaranteeing a portion of the quota for the other functions or keeping a function from exceeding a number of concurrent executions. Neither one of these are going to help you with cold start times, it is not defining the number of instances running, its just reserving a portion of quota so that they can run. With provisioned concurrency you can run as many as your quota allows but you are guaranteed to be able to handle as many concurrent instances have provisioned.

I think you are thinking of reserved concurrency - with reserved concurrency you can only run as many instances as you have reserved.
