I think that there may be another solution for this, that is the LLM write a valid code that calls the MCP's as functions. See it like a Python script, where each MCP is mapped to a function. A simple example:
def process(param1, param2):
my_data = mcp_get_data(param1)
sorted_data = mcp_sort(my_data, by=param2)
return sorted_data
From paper: "testing set: January 1, 2023, to December 31, 2023"
From the Llama 2 doc: "(...) some tuning data is more recent, up to July 2023."