1082 lines
82 KiB
Plaintext
1082 lines
82 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"\n",
|
||
|
"# Programming Assignment"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Language model for the Shakespeare dataset"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Instructions\n",
|
||
|
"\n",
|
||
|
"In this notebook, you will use the text preprocessing tools and RNN models to build a character-level language model. You will then train your model on the works of Shakespeare, and use the network to generate your own text..\n",
|
||
|
"\n",
|
||
|
"Some code cells are provided you in the notebook. You should avoid editing provided code, and make sure to execute the cells in order to avoid unexpected errors. Some cells begin with the line: \n",
|
||
|
"\n",
|
||
|
"`#### GRADED CELL ####`\n",
|
||
|
"\n",
|
||
|
"Don't move or edit this first line - this is what the automatic grader looks for to recognise graded cells. These cells require you to write your own code to complete them, and are automatically graded when you submit the notebook. Don't edit the function name or signature provided in these cells, otherwise the automatic grader might not function properly. Inside these graded cells, you can use any functions or classes that are imported below, but make sure you don't use any variables that are outside the scope of the function.\n",
|
||
|
"\n",
|
||
|
"### How to submit\n",
|
||
|
"\n",
|
||
|
"Complete all the tasks you are asked for in the worksheet. When you have finished and are happy with your code, press the **Submit Assignment** button at the top of this notebook.\n",
|
||
|
"\n",
|
||
|
"### Let's get started!\n",
|
||
|
"\n",
|
||
|
"We'll start running some imports, and loading the dataset. Do not edit the existing imports in the following cell. If you would like to make further Tensorflow imports, you should add them here."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 210,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#### PACKAGE IMPORTS ####\n",
|
||
|
"\n",
|
||
|
"# Run this cell first to import all required packages. Do not make any imports elsewhere in the notebook\n",
|
||
|
"\n",
|
||
|
"import tensorflow as tf\n",
|
||
|
"import numpy as np\n",
|
||
|
"import json\n",
|
||
|
"import matplotlib.pyplot as plt\n",
|
||
|
"%matplotlib inline\n",
|
||
|
"\n",
|
||
|
"# If you would like to make further imports from tensorflow, add them here\n",
|
||
|
"\n",
|
||
|
"from tensorflow.keras.preprocessing.text import Tokenizer\n",
|
||
|
"from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
|
||
|
"from tensorflow.keras import Sequential\n",
|
||
|
"from tensorflow.keras.layers import Embedding,GRU,Dense"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"#### The Shakespeare dataset\n",
|
||
|
"\n",
|
||
|
"In this assignment, you will use a subset of the [Shakespeare dataset](http://shakespeare.mit.edu). It consists of a single text file with several excerpts concatenated together. The data is in raw text form, and so far has not yet had any preprocessing. \n",
|
||
|
"\n",
|
||
|
"Your goal is to construct an unsupervised character-level sequence model that can generate text according to a distribution learned from the dataset."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Load and inspect the dataset"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 211,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Load the text file into a string\n",
|
||
|
"\n",
|
||
|
"with open('data/Shakespeare.txt', 'r', encoding='utf-8') as file:\n",
|
||
|
" text = file.read()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 212,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Create a list of chunks of text\n",
|
||
|
"\n",
|
||
|
"text_chunks = text.split('.')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"To give you a feel for what the text looks like, we will print a few chunks from the list."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 213,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"BIANCA:\n",
|
||
|
"And may you prove, sir, master of your art!\n",
|
||
|
"\n",
|
||
|
"LUCENTIO:\n",
|
||
|
"While you, sweet dear, prove mistress of my heart!\n",
|
||
|
"\n",
|
||
|
"HORTENSIO:\n",
|
||
|
"Quick proceeders, marry! Now, tell me, I pray,\n",
|
||
|
"You that durst swear at your mistress Bianca\n",
|
||
|
"Loved none in the world so well as Lucentio\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"LEONTES:\n",
|
||
|
"Good queen!\n",
|
||
|
"\n",
|
||
|
"PAULINA:\n",
|
||
|
"Good queen, my lord,\n",
|
||
|
"Good queen; I say good queen;\n",
|
||
|
"And would by combat make her good, so were I\n",
|
||
|
"A man, the worst about you\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"FLORIZEL:\n",
|
||
|
"I not purpose it\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"FLORIZEL:\n",
|
||
|
"O, that must be\n",
|
||
|
"I' the virtue of your daughter: one being dead,\n",
|
||
|
"I shall have more than you can dream of yet;\n",
|
||
|
"Enough then for your wonder\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"PAULINA:\n",
|
||
|
"Nay, rather, good my lords, be second to me:\n",
|
||
|
"Fear you his tyrannous passion more, alas,\n",
|
||
|
"Than the queen's life? a gracious innocent soul,\n",
|
||
|
"More free than he is jealous\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Display some randomly selected text samples\n",
|
||
|
"\n",
|
||
|
"num_samples = 5\n",
|
||
|
"inx = np.random.choice(len(text_chunks), num_samples, replace=False)\n",
|
||
|
"for chunk in np.array(text_chunks)[inx]:\n",
|
||
|
" print(chunk)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Create a character-level tokenizer"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"You should now write a function that returns a `Tokenizer` object. The function takes a list of strings as an argument, and should create a `Tokenizer` according to the following specification:\n",
|
||
|
"\n",
|
||
|
"* The number of tokens should be unlimited (there should be as many as required by the dataset).\n",
|
||
|
"* Tokens should be created at the character level (not at the word level, which is the default behaviour).\n",
|
||
|
"* No characters should be filtered out or ignored.\n",
|
||
|
"* The original capitalization should be retained (do not convert the text to lower case)\n",
|
||
|
"\n",
|
||
|
"The `Tokenizer` should be fit to the `list_of_strings` argument and returned by the function. \n",
|
||
|
"\n",
|
||
|
"**Hint:** you may need to refer to the [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer) for the `Tokenizer`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 214,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#### GRADED CELL ####\n",
|
||
|
"\n",
|
||
|
"# Complete the following function.\n",
|
||
|
"# Make sure not to change the function name or arguments.\n",
|
||
|
"\n",
|
||
|
"def create_character_tokenizer(list_of_strings):\n",
|
||
|
" \"\"\"\n",
|
||
|
" This function takes a list of strings as its argument. It should create \n",
|
||
|
" and return a Tokenizer according to the above specifications. \n",
|
||
|
" \"\"\"\n",
|
||
|
" tokenizer = Tokenizer(num_words=None,\n",
|
||
|
" filters = None,\n",
|
||
|
" lower=False,\n",
|
||
|
" char_level=True\n",
|
||
|
" )\n",
|
||
|
" tokenizer.fit_on_texts(list_of_strings)\n",
|
||
|
" \n",
|
||
|
" return tokenizer \n",
|
||
|
" \n",
|
||
|
" \n",
|
||
|
" \n",
|
||
|
" "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 215,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Get the tokenizer\n",
|
||
|
"\n",
|
||
|
"tokenizer = create_character_tokenizer(text_chunks)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Tokenize the text\n",
|
||
|
"\n",
|
||
|
"You should now write a function to use the tokenizer to map each string in `text_chunks` to its corresponding encoded sequence. The following function takes a fitted `Tokenizer` object in the first argument (as returned by `create_character_tokenizer`) and a list of strings in the second argument. The function should return a list of lists, where each sublist is a sequence of integer tokens encoding the text sequences according to the mapping stored in the tokenizer.\n",
|
||
|
"\n",
|
||
|
"**Hint:** you may need to refer to the [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer) for the `Tokenizer`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 216,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#### GRADED CELL ####\n",
|
||
|
"\n",
|
||
|
"# Complete the following function.\n",
|
||
|
"# Make sure not to change the function name or arguments.\n",
|
||
|
"\n",
|
||
|
"def strings_to_sequences(tokenizer, list_of_strings):\n",
|
||
|
" \"\"\"\n",
|
||
|
" This function takes a tokenizer object and a list of strings as its arguments.\n",
|
||
|
" It should use the tokenizer to map the text chunks to sequences of tokens and\n",
|
||
|
" then return this list of encoded sequences.\n",
|
||
|
" \"\"\"\n",
|
||
|
" sequences = tokenizer.texts_to_sequences(list_of_strings)\n",
|
||
|
" \n",
|
||
|
" return sequences"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 217,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Encode the text chunks into tokens\n",
|
||
|
"\n",
|
||
|
"seq_chunks = strings_to_sequences(tokenizer, text_chunks)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Pad the encoded sequences and store them in a numpy array"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Since not all of the text chunks are the same length, you will need to pad them in order to train on batches. You should now complete the following function, which takes the list of lists of tokens, and creates a single numpy array with the token sequences in the rows, according to the following specification:\n",
|
||
|
"\n",
|
||
|
"* The longest allowed sequence should be 500 tokens. Any sequence that is longer should be shortened by truncating the beginning of the sequence.\n",
|
||
|
"* Use zeros for padding the sequences. The zero padding should be placed before the sequences as required.\n",
|
||
|
"\n",
|
||
|
"The function should then return the resulting numpy array.\n",
|
||
|
"\n",
|
||
|
"**Hint:** you may want to refer to the [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences) for the `pad_sequences` function."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 218,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#### GRADED CELL ####\n",
|
||
|
"\n",
|
||
|
"# Complete the following function.\n",
|
||
|
"# Make sure not to change the function name or arguments.\n",
|
||
|
"\n",
|
||
|
"def make_padded_dataset(sequence_chunks):\n",
|
||
|
" \"\"\"\n",
|
||
|
" This function takes a list of lists of tokenized sequences, and transforms\n",
|
||
|
" them into a 2D numpy array, padding the sequences as necessary according to\n",
|
||
|
" the above specification. The function should then return the numpy array.\n",
|
||
|
" \"\"\"\n",
|
||
|
" \n",
|
||
|
" padded_sequences = pad_sequences(sequence_chunks,maxlen = 500,truncating = 'pre',padding = 'pre',value = 0)\n",
|
||
|
" \n",
|
||
|
" return padded_sequences"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 219,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Pad the token sequence chunks and get the numpy array\n",
|
||
|
"\n",
|
||
|
"padded_sequences = make_padded_dataset(seq_chunks)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 220,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"7886"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 220,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"len(padded_sequences)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Create model inputs and targets\n",
|
||
|
"\n",
|
||
|
"Now you are ready to build your RNN model. The model will receive a sequence of characters and predict the next character in the sequence. At training time, the model can be passed an input sequence, with the target sequence is shifted by one.\n",
|
||
|
"\n",
|
||
|
"For example, the expression `To be or not to be` appears in Shakespeare's play 'Hamlet'. Given input `To be or not to b`, the correct prediction is `o be or not to be`. Notice that the prediction is the same length as the input!\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"You should now write the following function to create an input and target array from the current `padded_sequences` array. The function has a single argument that is a 2D numpy array of shape `(num_examples, max_seq_len)`. It should fulfil the following specification:\n",
|
||
|
"\n",
|
||
|
"* The function should return an input array and an output array, both of size `(num_examples, max_seq_len - 1)`.\n",
|
||
|
"* The input array should contain the first `max_seq_len - 1` tokens of each sequence. \n",
|
||
|
"* The output array should contain the last `max_seq_len - 1` tokens of each sequence. \n",
|
||
|
"\n",
|
||
|
"The function should then return the tuple `(input_array, output_array)`. Note that it is possible to complete this function using numpy indexing alone!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 221,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#### GRADED CELL ####\n",
|
||
|
"\n",
|
||
|
"# Complete the following function.\n",
|
||
|
"# Make sure not to change the function name or arguments.\n",
|
||
|
"\n",
|
||
|
"def create_inputs_and_targets(array_of_sequences):\n",
|
||
|
" \"\"\"\n",
|
||
|
" This function takes a 2D numpy array of token sequences, and returns a tuple of two\n",
|
||
|
" elements: the first element is the input array and the second element is the output\n",
|
||
|
" array, which are defined according to the above specification.\n",
|
||
|
" \"\"\" \n",
|
||
|
" max_seq_len = array_of_sequences.shape[1]\n",
|
||
|
" input_array = array_of_sequences[:,0:max_seq_len - 1]\n",
|
||
|
" output_array = array_of_sequences[:,1:max_seq_len]\n",
|
||
|
" \n",
|
||
|
" return (input_array,output_array)\n",
|
||
|
" \n",
|
||
|
" \n",
|
||
|
" "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 222,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Create the input and output arrays\n",
|
||
|
"\n",
|
||
|
"input_seq, target_seq = create_inputs_and_targets(padded_sequences)\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Preprocess sequence array for stateful RNN\n",
|
||
|
"\n",
|
||
|
"We will build our RNN language model to be stateful, so that the internal state of the RNN will be maintained across batches. For this to be effective, we need to make sure that each element of every batch follows on from the corresponding element of the preceding batch (you may want to look back at the \"Stateful RNNs\" reading notebook earlier in the week).\n",
|
||
|
"\n",
|
||
|
"The following code processes the input and output sequence arrays so that they are ready to be split into batches for training a stateful RNN, by re-ordering the sequence examples (the rows) according to a specified batch size. "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 223,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Fix the batch size for training\n",
|
||
|
"\n",
|
||
|
"batch_size = 32"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 224,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Prepare input and output arrays for training the stateful RNN\n",
|
||
|
"\n",
|
||
|
"num_examples = input_seq.shape[0]\n",
|
||
|
"\n",
|
||
|
"num_processed_examples = num_examples - (num_examples % batch_size)\n",
|
||
|
"\n",
|
||
|
"input_seq = input_seq[:num_processed_examples]\n",
|
||
|
"target_seq = target_seq[:num_processed_examples]\n",
|
||
|
"\n",
|
||
|
"steps = int(num_processed_examples / 32) # steps per epoch\n",
|
||
|
"\n",
|
||
|
"inx = np.empty((0,), dtype=np.int32)\n",
|
||
|
"for i in range(steps):\n",
|
||
|
" inx = np.concatenate((inx, i + np.arange(0, num_processed_examples, steps)))\n",
|
||
|
"\n",
|
||
|
"input_seq_stateful = input_seq[inx]\n",
|
||
|
"target_seq_stateful = target_seq[inx]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Split the data into training and validation sets\n",
|
||
|
"\n",
|
||
|
"We will set aside approximately 20% of the data for validation."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 225,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Create the training and validation splits\n",
|
||
|
"\n",
|
||
|
"num_train_examples = int(batch_size * ((0.8 * num_processed_examples) // batch_size))\n",
|
||
|
"\n",
|
||
|
"input_train = input_seq_stateful[:num_train_examples]\n",
|
||
|
"target_train = target_seq_stateful[:num_train_examples]\n",
|
||
|
"\n",
|
||
|
"input_valid = input_seq_stateful[num_train_examples:]\n",
|
||
|
"target_valid = target_seq_stateful[num_train_examples:]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Create training and validation Dataset objects\n",
|
||
|
"\n",
|
||
|
"You should now write a function to take the training and validation input and target arrays, and create training and validation `tf.data.Dataset` objects. The function takes an input array and target array in the first two arguments, and the batch size in the third argument. Your function should do the following:\n",
|
||
|
"\n",
|
||
|
"* Create a `Dataset` using the `from_tensor_slices` static method, passing in a tuple of the input and output numpy arrays.\n",
|
||
|
"* Batch the `Dataset` using the `batch_size` argument, setting `drop_remainder` to `True`. \n",
|
||
|
"\n",
|
||
|
"The function should then return the `Dataset` object."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 226,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#### GRADED CELL ####\n",
|
||
|
"\n",
|
||
|
"# Complete the following function.\n",
|
||
|
"# Make sure not to change the function name or arguments.\n",
|
||
|
"\n",
|
||
|
"def make_Dataset(input_array, target_array, batch_size):\n",
|
||
|
" \"\"\"\n",
|
||
|
" This function takes two 2D numpy arrays in the first two arguments, and an integer\n",
|
||
|
" batch_size in the third argument. It should create and return a Dataset object \n",
|
||
|
" using the two numpy arrays and batch size according to the above specification.\n",
|
||
|
" \"\"\"\n",
|
||
|
" \n",
|
||
|
" dataset = tf.data.Dataset.from_tensor_slices((input_array,target_array))\n",
|
||
|
" dataset = dataset.batch(batch_size,drop_remainder = True)\n",
|
||
|
" return dataset\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 227,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Create the training and validation Datasets\n",
|
||
|
"\n",
|
||
|
"train_data = make_Dataset(input_train, target_train, batch_size)\n",
|
||
|
"valid_data = make_Dataset(input_valid, target_valid, batch_size)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Build the recurrent neural network model"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"You are now ready to build your RNN character-level language model. You should write the following function to build the model; the function takes arguments for the batch size and vocabulary size (number of tokens). Using the Sequential API, your function should build your model according to the following specifications:\n",
|
||
|
"\n",
|
||
|
"* The first layer should be an Embedding layer with an embedding dimension of 256 and set the vocabulary size to `vocab_size` from the function argument.\n",
|
||
|
"* The Embedding layer should also mask the zero padding in the input sequences.\n",
|
||
|
"* The Embedding layer should also set the `batch_input_shape` to `(batch_size, None)` (a fixed batch size is required for stateful RNNs).\n",
|
||
|
"* The next layer should be a (uni-directional) GRU layer with 1024 units, set to be a stateful RNN layer.\n",
|
||
|
"* The GRU layer should return the full sequence, instead of just the output state at the final time step.\n",
|
||
|
"* The final layer should be a Dense layer with `vocab_size` units and no activation function.\n",
|
||
|
"\n",
|
||
|
"In total, the network should have 3 layers."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 228,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#### GRADED CELL ####\n",
|
||
|
"\n",
|
||
|
"# Complete the following function.\n",
|
||
|
"# Make sure not to change the function name or arguments.\n",
|
||
|
"\n",
|
||
|
"def get_model(vocab_size, batch_size):\n",
|
||
|
" \"\"\"\n",
|
||
|
" This function takes a vocabulary size and batch size, and builds and returns a \n",
|
||
|
" Sequential model according to the above specification.\n",
|
||
|
" \"\"\"\n",
|
||
|
" \n",
|
||
|
" model = Sequential([\n",
|
||
|
" Embedding(input_dim = vocab_size ,output_dim = 256,mask_zero = True,batch_input_shape = (batch_size,None) ),\n",
|
||
|
" GRU(units = 1024,stateful = True,return_sequences = True),\n",
|
||
|
" Dense(vocab_size)\n",
|
||
|
" ])\n",
|
||
|
" \n",
|
||
|
" return model"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 229,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Model: \"sequential_3\"\n",
|
||
|
"_________________________________________________________________\n",
|
||
|
"Layer (type) Output Shape Param # \n",
|
||
|
"=================================================================\n",
|
||
|
"embedding_4 (Embedding) (32, None, 256) 16640 \n",
|
||
|
"_________________________________________________________________\n",
|
||
|
"gru_3 (GRU) (32, None, 1024) 3938304 \n",
|
||
|
"_________________________________________________________________\n",
|
||
|
"dense_3 (Dense) (32, None, 65) 66625 \n",
|
||
|
"=================================================================\n",
|
||
|
"Total params: 4,021,569\n",
|
||
|
"Trainable params: 4,021,569\n",
|
||
|
"Non-trainable params: 0\n",
|
||
|
"_________________________________________________________________\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Build the model and print the model summary\n",
|
||
|
"\n",
|
||
|
"model = get_model(len(tokenizer.word_index) + 1, batch_size)\n",
|
||
|
"model.summary()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Compile and train the model\n",
|
||
|
"\n",
|
||
|
"You are now ready to compile and train the model. For this model and dataset, the training time is very long. Therefore for this assignment it is not a requirement to train the model. We have pre-trained a model for you (using the code below) and saved the model weights, which can be loaded to get the model predictions. \n",
|
||
|
"\n",
|
||
|
"It is recommended to use accelerator hardware (e.g. using Colab) when training this model. It would also be beneficial to increase the size of the model, e.g. by stacking extra recurrent layers."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 230,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Choose whether to train a new model or load the pre-trained model\n",
|
||
|
"\n",
|
||
|
"skip_training = True"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 231,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Compile and train the model, or load pre-trained weights\n",
|
||
|
"\n",
|
||
|
"if not skip_training:\n",
|
||
|
" checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(filepath='./models/ckpt',\n",
|
||
|
" save_weights_only=True,\n",
|
||
|
" save_best_only=True)\n",
|
||
|
" model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
|
||
|
" metrics=['sparse_categorical_accuracy'])\n",
|
||
|
" history = model.fit(train_data, epochs=15, validation_data=valid_data, \n",
|
||
|
" validation_steps=50, callbacks=[checkpoint_callback])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 232,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Save model history as a json file, or load it if using pre-trained weights\n",
|
||
|
"\n",
|
||
|
"if not skip_training:\n",
|
||
|
" history_dict = dict()\n",
|
||
|
" for k, v in history.history.items():\n",
|
||
|
" history_dict[k] = [float(val) for val in history.history[k]]\n",
|
||
|
" with open('models/history.json', 'w+') as json_file:\n",
|
||
|
" json.dump(history_dict, json_file, sort_keys=True, indent=4)\n",
|
||
|
"else:\n",
|
||
|
" with open('models/history.json', 'r') as json_file:\n",
|
||
|
" history_dict = json.load(json_file)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Plot the learning curves"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 233,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3sAAAFNCAYAAAC5cXZ6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3Xl8FdXdx/HPLwtJgIRsQAIBEhbZwiKQICqICxWoK1Irat3ro4/Wbra11rba1tZWa7XW1rZWfWpVVLRqFVSsG7gRQPYdEiCQhECAJEDIdp4/ZgIXSEKAJDcJ3/frNa+7zJm5v3vFO/nec86MOecQERERERGRtiUk2AWIiIiIiIhI41PYExERERERaYMU9kRERERERNoghT0REREREZE2SGFPRERERESkDVLYExERERERaYMU9kSkyZlZqpk5MwsLdi0iIiJNwczuNbN/BbsOkUAKe9KqmdmHZrbTzCKCXYuIiEhrZmY5ZnZesOsQkcajsCetlpmlAmMBB1zUzK+tHioRERERadEU9qQ1uwb4HHgGuDZwhZlFmdnvzWyjme02s7lmFuWvO9PMPjWzXWa22cyu85//0MxuCtjHdWY2N+CxM7PbzGwtsNZ/7lF/H8VmtsDMxga0DzWzu81svZmV+Ot7mNnjZvb7w+r9j5l95/A3aGZPmNlDhz33upl9z7//IzPb4u9/tZmd25APzsy6mdkrZlZoZtlmdkfAunvNbIaZvejvd6GZDQtYP9D/rHaZ2XIzuyhgXZ2fu+8qM9tkZtvN7CcB22Wa2Xz/cywws4cb8j5ERKR5mNk3zWydmRWZ2Rtm1s1/3szsD2a2zf/eX2Jm6f66yWa2wj+WbDGzO2vZb4R/PEkPeK6zme0zsy5mlmhmb/ptisxsjpk16O9XM7vAzBb5235qZkMD1uWY2Y/9+naa2dNmFnm09+uvG2xms/11BWZ2d8DLtjOzf/rvebmZjQrY7riO2SInxDmnRUurXIB1wP8CI4EKoGvAuseBD4HuQChwOhAB9ARKgGlAOJAADPe3+RC4KWAf1wFzAx47YDYQD0T5z13t7yMM+D6QD0T6634ALAX6AwYM89tmAluBEL9dIrA3sP6A1xwHbAbMfxwH7AO6+fvdDHTz16UCfRrwuYUAC4CfAe2A3sAG4Hx//b3+5znV/4zuBLL9++H+5363v+05/ufZ/yife6r/+f0diPI/i/3AQH+7z4Bv+Pc7AqcF+9+XFi1atJxsC5ADnFfL8+cA24ER/nf6Y8DH/rrz/WNKrH+sGwgk++vygLH+/ThgRB2v+xRwf8Dj24C3/fu/AZ4IOAaNrTkmHuW9jAC2AaP949G1/vuLCHivy4AeeMf1T4BfNeD9Rvvv6/tApP94tL/uXqAMmOy/5m+Az/11x3XM1qLlRBf17EmrZGZnAr2Al5xzC4D1wJX+uhDgBuDbzrktzrkq59ynzrn9wFXAe865F5xzFc65Hc65Rcfw0r9xzhU55/YBOOf+5e+j0jn3e7yDQn+/7U3APc651c6z2G87D9gN1PyidwXwoXOuoJbXm4MXkmp6DKcCnznntgJV/usNMrNw51yOc259A95DBtDZOfcL51y5c24DXgi7IqDNAufcDOdcBfAw3gHtNH/pCDzgb/s+8CYw7Sife437nHP7nHOLgcV4oQ+8cNnXzBKdc6XOuc8b8D5ERKR5XAU85Zxb6H+n/xgYY950igq8wDMAL4StdM7l+dtV4B2jYpxzO51zC+vY//N4P8LWuNJ/rmYfyUAv/7g9xznnGlDzN4G/Oue+8I9H/4f3I+NpAW3+5Jzb7JwrAu4PqKG+93sBkO+c+71zrsw5V+Kc+yJgn3OdczOdc1XAsxw8zh3vMVvkhCjsSWt1LfCuc267//h5Dg7lTMQLJ7V9ifao4/mG2hz4wMy+b2Yr/aEru4BO/usf7bX+D69XEP/22doa+Qe06Rw8AF0JPOevWwd8B++XxG1mNj1wmEk9egHd/GEtu/y67wa61vY+nXPVQC5eb2I3YLP/XI2NeD159X3uNfID7u/FC44ANwKnAKvMLMvMLmjA+xARkebRDe+7HgDnXCmwA+ju/+j3J7yRHQVm9jczi/GbXobXy7XRzD4yszF17P99IMrMRptZL2A48G9/3YN4I0reNbMNZnZXA2vuBXz/sGNdD/+91Ag8pm8MWFfn++Xof0ccfpyLNLOwEzhmi5wQhT1pdfw5YJcDZ5lZvpnlA98Fhvlzy7bjDaPoU8vmm+t4HmAP0D7gcVItbQ78mmje/Lwf+bXEOedi8XrsrAGv9S/gYr/egcBrdbQDeAGY6h8ARwOvHCjGueedczW9nA74bT37qbEZyHbOxQYs0c65yQFtegS8zxAgBW/o6Vagx2HzJXoCW6j/c6+Xc26tc24a0MV/DzPMrMOx7kdERJrEVrzjDAD+93MC3nc/zrk/OudGAoPxfrj7gf98lnPuYrzv9teAl2rbuf8D4kt4P2xeCbzpnCvx15U4577vnOsNXAh8r4Fz3TbjDQ0NPNa1d869ENCmR8D9nv77PNr7re/YXq/jPGaLnBCFPWmNLsEbDjEI79e/4XiBaQ5wjX/QeAp42LwTkYSa2RjzLs/wHHCemV1uZmFmlmBmw/39LgKmmFl7M+uL19tUn2igEigEwszsZ0BMwPongV+aWT9/AvtQM0sAcM7lAll4PXqv1AwLrY1z7kv/NZ4E3nHO7QIws/5mdo7/vsrw5vJVHf3jYx5Q7E8Uj/I/n3QzywhoM9LMpph31tHv4A19+Rz4Ai8U/9DMws1sPN7Bd/pRPvd6mdnVZtbZ38cu/+mGvBcREWlc4WYWGbCE4Y2eud7Mhvvf6b8GvnDO5ZhZht8jF453fCgDqsysnZldZWad/CkBxdT/vf488HW8IZQ1QzhrTrLS18wsYB8NOT78HbjFr83MrIOZfdXMogPa3GZmKWYWjzfC5cWAWmp9v3hTF5LM7DvmnVwm2sxGH62YEzhmi5wQhT1pja4FnnbObXLO5dcseMNIrvIPTHfinRwlCyjC+/UsxDm3CW9Iyff95xdxcDz9H4ByoABvmOVzR6njHWAWsAZvuEcZhw4JeRjvl8p38Q5Q/8A7OUmN/wOGUMcQzsO8AJxHwAEQb+z/A3g9avl4v5zeDeAfYJfXtiN/HsGFeCE529/+SbwhqDVexzvo7gS+AUzx50qU413mYpK/3Z/xAvYqf7taP/cGvL+JwHIzKwUeBa5wzpU1YDsREWlcM/GCSM1yr3Puv8BP8UaW5OH1bNXM847BC1Y78Y6FO4Cas0h/A8gxs2LgFg5OXziCP+9tD94QylkBq/oB7wGleCfz+rNz7kMAM5tlh54JM3B/8/Hm7f3Jr20d3onXAj2Pd4ze4C+/8ret8/36PY4T8I6j+Xhn5z67rvcVoM5jtkhTqjnDn4g0MzMbhzecM/WwOXBBZWb3An2dc3UelEVERFozM8vBOwP3e8GuRaQpqWdPJAj84S7fBp5sSUFPRERERNoOhT2RZmZmA/HmpSUDjwS5HBERERFpozSMU0REREREpA1Sz56IiIiIiEgbpLAnIiIiIiLSBoUFu4BjlZiY6FJTU4NdhoiINIMFCxZsd851DnYdrYWOkSIiJ4eGHh9bXdhLTU1l/vz5wS5DRESagZltDHYNrYmOkSIiJ4eGHh81jFNERERERKQNUtgTERERERFpgxT2RERERERE2qBWN2dPRERERERanoqKCnJzcykrKwt2KW1GZGQkKSkphIeHH9f2CnsiIiIiInLCcnNziY6OJjU1FTMLdjmtnnOOHTt2kJubS1pa2nHtQ8M4RURERETkhJWVlZGQkKCg10jMjISEhBPqKVXYExERERGRRqGg17hO9PNU2BMRERERkVZvx44dDB8+nOHDh5OUlET37t0PPC4vL2/QPq6//npWr15db5vHH3+c5557rjFKbnKasyciIiIiIq1eQkICixYtAuDee++lY8eO3HnnnYe0cc7hnCMkpPY+r6effvqor3PbbbedeLHNRD17IiLSqNYXlvKPudk454JdihyjT9Z
|
||
|
"text/plain": [
|
||
|
"<Figure size 1080x360 with 2 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {
|
||
|
"needs_background": "light"
|
||
|
},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Run this cell to plot accuracy vs epoch and loss vs epoch\n",
|
||
|
"\n",
|
||
|
"plt.figure(figsize=(15,5))\n",
|
||
|
"plt.subplot(121)\n",
|
||
|
"plt.plot(history_dict['sparse_categorical_accuracy'])\n",
|
||
|
"plt.plot(history_dict['val_sparse_categorical_accuracy'])\n",
|
||
|
"plt.title('Accuracy vs. epochs')\n",
|
||
|
"plt.ylabel('Accuracy')\n",
|
||
|
"plt.xlabel('Epoch')\n",
|
||
|
"plt.xticks(np.arange(len(history_dict['sparse_categorical_accuracy'])))\n",
|
||
|
"ax = plt.gca()\n",
|
||
|
"ax.set_xticklabels(1 + np.arange(len(history_dict['sparse_categorical_accuracy'])))\n",
|
||
|
"plt.legend(['Training', 'Validation'], loc='lower right')\n",
|
||
|
"\n",
|
||
|
"plt.subplot(122)\n",
|
||
|
"plt.plot(history_dict['loss'])\n",
|
||
|
"plt.plot(history_dict['val_loss'])\n",
|
||
|
"plt.title('Loss vs. epochs')\n",
|
||
|
"plt.ylabel('Loss')\n",
|
||
|
"plt.xlabel('Epoch')\n",
|
||
|
"plt.xticks(np.arange(len(history_dict['sparse_categorical_accuracy'])))\n",
|
||
|
"ax = plt.gca()\n",
|
||
|
"ax.set_xticklabels(1 + np.arange(len(history_dict['sparse_categorical_accuracy'])))\n",
|
||
|
"plt.legend(['Training', 'Validation'], loc='upper right')\n",
|
||
|
"plt.show() "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Write a text generation algorithm\n",
|
||
|
"\n",
|
||
|
"You can now use the model to generate text! In order to generate a single text sequence, the model needs to be rebuilt with a batch size of 1."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 234,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f7ea697cdd8>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 234,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Re-build the model and load the saved weights\n",
|
||
|
"\n",
|
||
|
"model = get_model(len(tokenizer.word_index) + 1, batch_size=1)\n",
|
||
|
"model.load_weights(tf.train.latest_checkpoint('./models/'))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"An algorithm to generate text is as follows:\n",
|
||
|
"\n",
|
||
|
"1. Specify a seed string (e.g. `'ROMEO:'`) to get the network started, and a define number of characters for the model to generate, `num_generation_steps`.\n",
|
||
|
"2. Tokenize this sentence to obtain a list containing one list of the integer tokens.\n",
|
||
|
"3. Reset the initial state of the network. \n",
|
||
|
"4. Convert the token list into a Tensor (or numpy array) and pass it to your model as a batch of size one.\n",
|
||
|
"5. Get the model prediction (logits) for the last time step and extract the state of the recurrent layer.\n",
|
||
|
"6. Use the logits to construct a categorical distribution and sample a token from it.\n",
|
||
|
"7. Repeat the following for `num_generation_steps - 1` steps:\n",
|
||
|
"\n",
|
||
|
" 1. Use the saved state of the recurrent layer and the last sampled token to get new logit predictions\n",
|
||
|
" 2. Use the logits to construct a new categorical distribution and sample a token from it.\n",
|
||
|
" 3. Save the updated state of the recurrent layer. \n",
|
||
|
"\n",
|
||
|
"8. Take the final list of tokens and convert to text using the Tokenizer.\n",
|
||
|
"\n",
|
||
|
"Note that the internal state of the recurrent layer can be accessed using the `states` property. For the GRU layer, it is a list of one variable:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 235,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"[<tf.Variable 'gru_4/Variable:0' shape=(1, 1024) dtype=float32, numpy=array([[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>]"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 235,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Inspect the model's current recurrent state\n",
|
||
|
"\n",
|
||
|
"model.layers[1].states"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"We will break the algorithm down into two steps. First, you should now complete the following function that takes a sequence of tokens of any length and returns the model's prediction (the logits) for the last time step. The specification is as follows:\n",
|
||
|
"\n",
|
||
|
"* The token sequence will be a python list, containing one list of integer tokens, e.g. `[[1, 2, 3, 4]]`\n",
|
||
|
"* The function should convert the list into a 2D Tensor or numpy array\n",
|
||
|
"* If the function argument `initial_state` is `None`, then the function should reset the state of the recurrent layer to zeros.\n",
|
||
|
"* Otherwise, if the function argument `initial_state` is a 2D Tensor or numpy array, assign the value of the internal state of the GRU layer to this argument.\n",
|
||
|
"* Get the model's prediction (logits) for the last time step only.\n",
|
||
|
"\n",
|
||
|
"The function should then return the logits as a 2D numpy array, where the first dimension is equal to 1 (batch size).\n",
|
||
|
"\n",
|
||
|
"**Hint:** the internal state of the recurrent can be reset to zeros using the `reset_states` method."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 251,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#### GRADED CELL ####\n",
|
||
|
"\n",
|
||
|
"# Complete the following function.\n",
|
||
|
"# Make sure not to change the function name or arguments.\n",
|
||
|
"\n",
|
||
|
"def get_logits(model, token_sequence, initial_state=None):\n",
|
||
|
" \"\"\"\n",
|
||
|
" This function takes a model object, a token sequence and an optional initial\n",
|
||
|
" state for the recurrent layer. The function should return the logits prediction\n",
|
||
|
" for the final time step as a 2D numpy array.\n",
|
||
|
" \"\"\"\n",
|
||
|
" # I couldn't get this one right. Let me know if you figured it out.\n",
|
||
|
" if initial_state is None:\n",
|
||
|
" model.layers[1].reset_states()\n",
|
||
|
" else:\n",
|
||
|
" initial_state = model.layers[1].states\n",
|
||
|
" \n",
|
||
|
" prediction = model.predict(token_sequence)\n",
|
||
|
" \n",
|
||
|
" return prediction[0]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 252,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[-5.115009 , -8.0475 , 1.8503139 , 4.975737 , 2.8871026 ,\n",
|
||
|
" 4.1724453 , 4.437543 , 2.98841 , 0.52165407, 2.4829183 ,\n",
|
||
|
" 3.7190795 , -1.3724718 , 0.30877763, 3.4526918 , 1.047548 ,\n",
|
||
|
" 5.362768 , 3.5962613 , -7.138241 , 3.7604315 , 0.7027515 ,\n",
|
||
|
" 3.0481653 , 1.8022449 , 2.4562337 , 1.38671 , 1.5200446 ,\n",
|
||
|
" -8.977055 , 3.335435 , 1.4620303 , -0.6112106 , 5.011204 ,\n",
|
||
|
" 0.26996726, -1.633431 , 3.2971196 , -1.6511749 , -0.36638367,\n",
|
||
|
" 2.0932577 , 0.33700356, 1.7744293 , -8.395738 , 3.5642414 ,\n",
|
||
|
" -1.4412229 , 2.672013 , 1.1012034 , 3.8206997 , -7.389186 ,\n",
|
||
|
" 1.4560288 , -6.889679 , 0.6923045 , -6.5362697 , 0.43075308,\n",
|
||
|
" 1.169462 , 1.5707369 , -1.5701991 , 0.43702215, 0.89825916,\n",
|
||
|
" 0.96894693, -4.3608193 , -4.027752 , 1.015482 , -3.7264423 ,\n",
|
||
|
" -3.200475 , -2.9556887 , -3.034881 , -5.616536 , -4.1949883 ]],\n",
|
||
|
" dtype=float32)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 252,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Test the get_logits function by passing a dummy token sequence\n",
|
||
|
"\n",
|
||
|
"dummy_initial_state = tf.random.normal(model.layers[1].states[0].shape)\n",
|
||
|
"get_logits(model, [[1, 2, 3, 4]], initial_state=dummy_initial_state)\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"You should now write a function that takes a logits prediction similar to the above, uses it to create a categorical distribution, and samples a token from this distribution. The following function takes a 2D numpy array `logits` as an argument, and should return a single integer prediction that is sampled from the categorical distribution. \n",
|
||
|
"\n",
|
||
|
"**Hint:** you might find the `tf.random.categorical` function useful for this; see the documentation [here](https://www.tensorflow.org/api_docs/python/tf/random/categorical)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 238,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#### GRADED CELL ####\n",
|
||
|
"\n",
|
||
|
"# Complete the following function.\n",
|
||
|
"# Make sure not to change the function name or arguments.\n",
|
||
|
"\n",
|
||
|
"def sample_token(logits):\n",
|
||
|
" \"\"\"\n",
|
||
|
" This function takes a 2D numpy array as an input, and constructs a \n",
|
||
|
" categorical distribution using it. It should then sample from this\n",
|
||
|
" distribution and return the sample as a single integer.\n",
|
||
|
" \"\"\"\n",
|
||
|
" # I couldn't get this one right. Let me know if you figured it out.\n",
|
||
|
" sample = tf.random.categorical(logits,1)\n",
|
||
|
" sampe = tf.squeeze(sample,axis = -1)\n",
|
||
|
" \n",
|
||
|
" return sample[0]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 239,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<tf.Tensor: id=13238, shape=(1,), dtype=int64, numpy=array([16])>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 239,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Test the sample_token function by passing dummy logits\n",
|
||
|
"\n",
|
||
|
"dummy_initial_state = tf.random.normal(model.layers[1].states[0].shape)\n",
|
||
|
"dummy_logits = get_logits(model, [[1, 2, 3, 4]], initial_state=dummy_initial_state)\n",
|
||
|
"sample_token(dummy_logits)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 246,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"41 0\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 246,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"logits_size = dummy_logits.shape[1]\n",
|
||
|
"dummy_logits = -np.inf*np.ones((1, logits_size))\n",
|
||
|
"dummy_logits[0, 20] = 0\n",
|
||
|
"sample_token(dummy_logits)\n",
|
||
|
"random_inx = np.random.choice(logits_size, 2, replace=False)\n",
|
||
|
"random_inx1, random_inx2 = random_inx[0], random_inx[1]\n",
|
||
|
"print(random_inx1, random_inx2)\n",
|
||
|
"dummy_logits = -np.inf*np.ones((1, logits_size))\n",
|
||
|
"dummy_logits[0, random_inx1] = 0\n",
|
||
|
"dummy_logits[0, random_inx2] = 0\n",
|
||
|
"sampled_token = []\n",
|
||
|
"for _ in range(100):\n",
|
||
|
" sampled_token.append(sample_token(dummy_logits))\n",
|
||
|
" \n",
|
||
|
"l_tokens, l_counts = np.unique(np.array(sampled_token), return_counts=True)\n",
|
||
|
"len(l_tokens) == 2"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"#### Generate text from the model\n",
|
||
|
"\n",
|
||
|
"You are now ready to generate text from the model!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 243,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Create a seed string and number of generation steps\n",
|
||
|
"\n",
|
||
|
"init_string = 'ROMEO:'\n",
|
||
|
"num_generation_steps = 1000"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 244,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"ename": "IndexError",
|
||
|
"evalue": "index 1 is out of bounds for axis 0 with size 1",
|
||
|
"output_type": "error",
|
||
|
"traceback": [
|
||
|
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||
|
"\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)",
|
||
|
"\u001b[0;32m<ipython-input-244-d0f86e3f0c70>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0m_\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnum_generation_steps\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0mlogits\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_logits\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minput_sequence\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minitial_state\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minitial_state\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 9\u001b[0m \u001b[0msampled_token\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msample_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlogits\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mtoken_sequence\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msampled_token\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||
|
"\u001b[0;32m<ipython-input-197-240442586026>\u001b[0m in \u001b[0;36mget_logits\u001b[0;34m(model, token_sequence, initial_state)\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0mprediction\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmodel\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_sequence\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 18\u001b[0;31m \u001b[0mprediction\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mprediction\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m...\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 19\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mprediction\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||
|
"\u001b[0;31mIndexError\u001b[0m: index 1 is out of bounds for axis 0 with size 1"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Use the model to generate a token sequence\n",
|
||
|
"\n",
|
||
|
"token_sequence = tokenizer.texts_to_sequences([init_string])\n",
|
||
|
"initial_state = None\n",
|
||
|
"input_sequence = token_sequence\n",
|
||
|
"\n",
|
||
|
"for _ in range(num_generation_steps):\n",
|
||
|
" logits = get_logits(model, input_sequence, initial_state=initial_state)\n",
|
||
|
" sampled_token = sample_token(logits)\n",
|
||
|
" token_sequence[0].append(sampled_token)\n",
|
||
|
" input_sequence = [[sampled_token]]\n",
|
||
|
" initial_state = model.layers[1].states[0].numpy()\n",
|
||
|
" \n",
|
||
|
"print(tokenizer.sequences_to_texts(token_sequence)[0][::2])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Congratulations for completing this programming assignment! In the next week of the course we will see how to build customised models and layers, and make custom training loops."
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"coursera": {
|
||
|
"course_slug": "tensor-flow-2-2",
|
||
|
"graded_item_id": "4eYSM",
|
||
|
"launcher_item_id": "HEV6h"
|
||
|
},
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.7.1"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 2
|
||
|
}
|