Web2py

From Training Material
Revision as of 14:22, 8 July 2019 by Apatrakov (talk | contribs) (→‎Package contents ⌘)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
title
Web2py for Developers
author
Alexander Patrakov

Course plan ⌘

  • Day 1: Introduction, Routing, Controllers, Views
  • Day 2: DAL
  • Day 3: Access control, I18N, Validators, Forms
  • Day 4: JS, CSS, Ajax, Deployment
  • We aim at ~45 slides per day

Let's start from the basics ⌘

  • WSGI
  • Purpose of web frameworks
  • Available web frameworks
  • Place of web2py in the ecosystem

HTTP protocol ⌘

  • HTTP/1.1 is defined by RFC 7230 - RFC 7235
  • Browsers are the most popular clients
  • There are non-browser clients
  • Web servers deliver HTML, images, JS, CSS, downloadable files, other content types

HTTP Request ⌘

GET /html/rfc7240 HTTP/1.1
Host: tools.ietf.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.90 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch, br
Accept-Language: en-US,en;q=0.8,ru;q=0.6
Cookie: __cfduid=d105aee619396e6187d19c60cb032f4b71479467417

HTTP Response ⌘

HTTP/1.1 200 OK
Server: Apache/2.2.22 (Debian)
Last-Modified: Sun, 13 Nov 2016 10:39:20 GMT
ETag: "3cc378-c260-5412c55f8017f;54191604efade"
Accept-Ranges: bytes
Cache-Control: max-age=604800
Expires: Fri, 25 Nov 2016 11:15:54 GMT
Strict-Transport-Security: max-age=3600
Content-Length: 12952
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8
<other headers omitted for brevity>

<!DOCTYPE html ...

Python web applications ⌘

  • WSGI is the interface between the web server and the web application
  • The web server calls a callable for every request
    • usually "application", but the name is configurable
  • It should take two parameters: environ and start_response
    • environ is a dict with CGI-like parameters
      • also contains the "input" stream used for POST content
    • start_response(status, response_headers, exc_info=None) is a callback
      • should be used by the application to set the response headers and status code
  • It should return an iterable which supplies pieces of content

Simplest Python web application ⌘

(use it with Apache's mod_wsgi, or run from command line)

#!/usr/bin/python2

from wsgiref.simple_server import make_server

def application(environ, start_response):
    status = '200 OK'
    output = 'Hello World!'

    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)

    return [output]

if __name__ == "__main__":
    httpd = make_server('', 8000, application)
    httpd.serve_forever()

Exercise: try the raw WSGI ⌘

  • Can you print the complete WSGI environment to the browser?
  • Can you handle two different URLs? ("Hello world" and environment)

Tasks of a web framework ⌘

  • Routing
  • Sessions
  • Response formatting
  • Input validation
  • Database communication
  • Structuring your application

Available web frameworks ⌘

  • Batteries included
    • Self-contained and comprehensive
    • Examples: Django, TurboGears, web2py
  • Best of breed
    • Just a thin layer of glue between other libraries
    • Examples: Flask, Pyramid, CherryPy, Bottle

web2py as compared to other web frameworks ⌘

  • Full-stack
  • MVC
  • Quite different from all other frameworks
    • It is recommended that you also learn some other framework, just to see what's specific to web2py
    • Flask is recommended as something diametrally opposite
  • A bit old-fashioned:
    • Does not rely on gettext, uses its own i18n mechanism
    • Does not use unicode internally
  • Unique features:
    • Has a web-based IDE
    • Uses Python as a language for views
    • Uses Database Abstraction Layer (DAL) instead of Object-Relational Mapper (ORM)
    • Requires no installation
  • Has a lot of built-in functionality

web2py criticisms that are still valid ⌘

  • Non-traditional use of execfile instead of import (1)
  • Too different from established Python idioms (2)
    • Can teach you things that are frowned upon by general Python community

Download ⌘

Don't install! ⌘

  • No installation needed
    • Don't install web2py from pip - it's outdated there
    • Note: there's no traditional setup.py
    • It still may be a good idea to use virtualenv, due to other packages
  • Debian and Ubuntu adapted web2py for their packaging system
    • No reverse dependencies
    • Reasonably modern only in Debian Testing and Ubuntu 16.04+
      • There is danger of it being removed due to licensing issues
      • Absolutely no support, at least two known holes

First steps ⌘

  • Package contents and directory structure
  • Running the default applications

Package contents ⌘

  • ./web2py.py: the main script (executable!)
  • ./Makefile: builds documentation and installers
  • ./applications: default applications and your applications
  • ./deposit: if you remove applications, they will be archived here
  • ./gluon: the web framework sits here
  • ./gluon/packages/dal: Database Abstraction Layer
  • ./handlers, ./examples, ./scripts: example integrations with web servers
  • ./logs: by default, logs go here
  • ./site-packages: place for third-party Python modules

Running the development server ⌘

  • python2 web2py.py
    • Will ask for port and admin password
    • Optional GUI uses Tk
  • Help: python2 web2py.py --help
  • Shell in some application: python2 web2py.py -S welcome
    • It's a good idea to have IPython (now Jupyter) installed
    • Add -M to load models

Structure of an application ⌘

  • ./models/*.py: models
    • You can create subdirectories if some models are needed only by some controllers
  • ./views: views (mostly HTML with special syntax for printing, branching and looping)
  • ./controllers/*.py: controllers
    • ./controllers/default.py: default controller
    • ./controllers/appadmin.py: application admin (use for manipulating database tables)
  • ./modules/*.py: your own importable Python modules
  • ./private: non-Python files private to the application
    • e.g. appconfig.ini
  • ./sessions: sessions are stored here by default
    • It is also possible to store them in a database
  • ./static: static files (js, css, fonts, images, ...)
    • are not processed by Python
  • ./uploads: files that your users upload are renamed and stored here

MVC ⌘

  • Model, View, Controller
  • Essentially a buzzword that only indicates that Models, Views and Controllers exist, with some separation of concerns
    • Wikipedia's articles on MVC, MVP (Model, View, Presenter) and similar software architectures are written with classical GUI-based software in mind
    • The English Wikipedia MVC interaction diagram (below) doesn't apply to web frameworks
      • In web2py and most other web frameworks, the model does not actively update the view
    • Non-English editions of Wikipedia even have different interaction diagrams!

MVC as described on English Wikipedia MVC as described on German Wikipedia MVC as described on French Wikipedia

MVC in web frameworks: Model ⌘

  • Model:
    • Has business-domain knowledge (both logic and data)
    • Provides API that can be used by the controller to read state and submit updates
    • Does not contain presentation logic
    • Note: in many other ORMs and web frameworks (not web2py!), a model is only an abstraction over data storage (such as database) and validation mechanism. I.e. there is no accent on the API that makes sense in the business problem domain.
      • Good article on the issue, by Julien Pauli
      • In web2py, the logic that detects which parts of the website menu should exist is part of the model. Even if the menu is not stored in the database.
      • Even the data you get from web services can be expressed as a Model!

MVC in web frameworks: View ⌘

  • View:
    • Presents data to the user
    • Usually as HTML
      • Data can come from the model and from the controller

MVC in web frameworks: Controller ⌘

  • Controller:
    • Reacts to user actions (in the form of HTTP requests)
    • Decides what to do
    • Updates model state, or queries state from the model
    • Asks the appropriate view to render itself, supplies the necessary data
  • Frequently advocated approach: fat models, skinny controllers

Routing ⌘

  • Mapping URLs to handlers
    • I.e. to functions in controllers
  • Accessing URL parameters

Default routing ⌘

  • http://localhost:8000/app/controller/function.ext/arg0/arg1?foo=bar&a=b
  • applications/app/controller.py is executed in a special environment
  • function() is called without arguments
    • Can return a string
    • Can raise an HTTP exception
    • Can return a dict - in this case, a view is executed
      • return locals() is a common idiom
  • arg0 and arg1 go to request.args
  • foo=bar&a=b go to request.vars
  • There are also request.cookies
  • There is a URL() function to construct a URL from app, controller, function, args and vars

Exposed functions ⌘

# pattern to find exposed functions in controller
regex_expose = re.compile(
    '^def\s+(?P<name>_?[a-zA-Z0-9]\w*)\( *\)\s*:',
    flags=re.M)

Non-exposed functions ⌘

  • So, here is what won't match NOW:
    • Imported functions
    • Functions with arguments
    • Functions starting with more than one underscore
    • Functions defined under if True:
    • Earlier there was an advice to include a space before the ":" - it doesn't work now
  • Assume that your function WILL be exposed via URL, unless official documentation says it won't
    • The documentation only says: "Functions that take arguments or start with a double underscore are not publicly exposed and can only be called by other functions."
    • Yeah, the situation with imported functions is stupid

Custom routing ⌘

  • / maps to the "init" app (or "welcome"), "default" controller, "index()" function
  • Names can be changed in routes.py
    • default_application, default_controller, default_function
  • You can also define routes_in and routes_out (as tuples of 2-tuples)
    • routes_in maps HTTP URL patterns to /app/controller/function
    • routes_out maps /app/controller/function patterns to URLs (for the URL() function)

Routing in other web frameworks ⌘

  • The existence of default routing is web2py-specific
    • By extension, so are application, controller and function names in the URL
  • Other frameworks require explicit export of their equivalent of controller functions
    • Django: needs an entry in urls.py that maps a URL regex to a view
    • Flask: needs a decorator that applies the URL template, like this: @app.route("/all_topics/<int:page>")
      • This also passes the page as the parameter - no need to mess with request.args
  • In other frameworks, there is no real need to use any equivalent of request.vars - this is usually handled (including the necessary validation) by form classes
    • You can do this in web2py, too - but disable CSRF protection

Views ⌘

  • Default view for controller.function: views/controller/function.ext
    • So the same data can be formatted as HTML, JSON or whatever else
  • There are also generic views, enabled by default only for requests from localhost

View syntax ⌘

  • Write whatever HTML is needed - it will be delivered to the browser
  • Add Python code in {{ ... }} braces
    • Indentation is ignored, the code is reformatted internally
    • ":" indents, "pass" unindents
    • There is some special syntax for printing variables: {{=something}}
      • Something will be converted to HTML as appropriate
        • Thus avoiding XSS attacks
      • Variables come from the dict returned from the controller
      • Some other objects are also available
    • Special syntax for extending and including other views, to be discussed later

XSS attacks ⌘

  • XSS = Cross-Site Scripting
  • Interpretation of malicious data as HTML (or, worse, JavaScript) by the browser
    • Reflected XSS: data from the query string are interpreted as HTML
    • Stored XSS: data from the database are interpreted as HTML
  • Typical exploitation scenario: write a "blog post" that also sends document.cookie to attacker when viewed by admin
    • Admin's session is now hijacked!
  • Solution: convert data from text to HTML
    • Also known as "escaping"

Implicit "imports" ⌘

  • Not real imports - all files are effectively executed in the same namespace
    • The result is quite close to concatenation, but with bytecode caching
    • This does mean that all your models are redefined on every request - something that other web frameworks avoid
  • In controllers, web2py implicitly "imports" everything from your models
  • In views, web2py automatically puts into the global scope the dict returned by your function
  • These implicit "imports" often confuse IDEs
    • Solution: buy IDE with special support
    • PyCharm Professional Edition qualifies, Community Edition doesn't
  • They also confuse pylint

How objects are printed ⌘

  • If .xml() is callable, it is called, and the result is treated as HTML
    • Hint: if you already have some known-good HTML, construct an XML(some_html) from it
  • Otherwise, the object is transformed to a string, escaped, and printed
  • Implementation details:
    • The parser is in gluon.template.TemplateParser.parse()
    • It creates a Node object that contains response.write(something)
    • Then see gluon.globals.Response.write()
      • It calls self.body.write(to_native(xmlescape(data)))
      • Finally, see gluon.html.xmlescape()

How to format objects for JavaScript ⌘

  • var myobj = {{=mydict}}; won't work
    • Reason: you want to transform to JSON, not to string with HTML escaping
  • This pattern works:
<!-- at the top -->
{{ from gluon.serializers import json }}
<script><!--
var myobj = {{=XML(json(mydict))}};
--></script>
  • json() converts dict to a JSON string
  • XML() prevents HTML escaping
  • There is also ASSIGNJS() helper that produces the assignments

Exercise: available globals ⌘

  • Figure out what is available as globals in your controllers
  • Same for views

Storage objects ⌘

  • Importable from gluon.storage
  • class Storage inherits from dict
    • Also allows to get or set keys using the attribute syntax: st.foo = bar
    • Difference: undefined attributes are created on-the-fly and become None (so no KeyError)
  • There is also class StorageList
    • Difference: undefined attributes are [], not None
      • Useful for keeping lists
  • There is also class List
    • Like list, but can be addressed using (...), not only [...] syntax
      • Does not raise IndexError when used with (...)

Request ⌘

  • Available as request in controllers and views
  • Has some interesting attributes
    • request.args: a List with parts from the URL after function.ext
    • request.get_vars: GET-style parameters from the URL, as a Storage
    • There are also request.post_vars, and request.vars contains both GET and POST vars
    • request.env: WSGI environment, as a Storage
      • request.env.http_accept-language is a value of the Accept-Language HTTP header
      • request.env.request_method is the request method
    • request.cookies: cookies. Not a Storage.
    • request.application, request.controller, request.function, request.extension
    • request.is_local, request.is_https

Figuring out the client's IP address ⌘

  • request.env.remote_addr is always the correct answer
    • If it isn't, you are not running the web server in the recommended configuration
      • E.g. you are proxying to Rocket
      • Or proxying to Apache and forgot to install mod_remoteip
  • It is OK to use request.is_local
  • False leads:
    • request.env.http_x_forwarded_for
      • Verify that it only comes from your proxy
      • Try to fake it using Firefox and this extension, see how your system reacts
    • request.client
      • Should never be used - tries X-Forwarded-For first (and uses the first address there), and REMOTE_ADDR next
      • In Django, all use of X-Forwarded-* was treated as a vulnerability and removed

If you still want to use reverse proxy ⌘

  • Please unset these incoming headers unconditionally
    • X-Forwarded-For
    • X-Forwarded-Proto
  • Make sure that the Host: header is correct

Accessing request.vars securely ⌘

  • Attack vector: http://example.com/a/c/f?foo=bar&foo=baz
    • Result: request.vars.foo == ['bar', 'baz']
    • A list, not a string!
  • Solution: always use one of these:
    • request.vars.getfirst('foo')
    • request.vars.getlist('foo')
    • request.vars.getlast('foo')

request.folder ⌘

  • The base filder of the application
  • Contains models, views and controllers subdirectories
  • Please use request.folder to construct application-specific paths, like illustrated below
    • Example from the default admin app
deny_file = os.path.join(request.folder, 'private', 'hosts.deny')
  • You can use this even in models. That's business logic, after all...

Response ⌘

  • This is a subclass of Storage
  • Fields with obvious meaning:
    • status (HTTP error code), headers (a dict), cookies
  • Fields used by the default layout.html file:
    • title, subtitle
    • flash (a string or object to be flashed on the page), meta (Storage for meta tags), menu, files (JS and CSS from static files)
    • Use something like response.files.append(URL('static','css/bootstrap.min.css'))
    • Put {{ response.include_meta() }} and {{ response.include_files() }} in your layout.html
      • It will minify and virtually version your static files
    • A debugging/profiling toolbar: {{=response.toolbar()}}
      • Note the difference: you have to display the return value
    • You can add anything else - this is sometimes useful for communicating information from a view to the layout

Session ⌘

  • Per-client Storage that survives between requests
  • Separate session per application
  • Identifier stored in a cookie
  • Contents stored on disk as a file by default
    • Can be reconfigured for database-backed storage

Flashing messages ⌘

  • Useful for notifying the user about the outcome of their action
  • If you are sure that you won't redirect right now: response.flash = 'your message'
  • A version that survives redirect: session.flash = 'your message'
  • Some other web frameworks can flash multiple messages, web2py can't
  • In your view, refer to response.flash

Exercise: flash code study ⌘

  • Find in web2py sources where session.flash is moved to response.flash

Exercise: HTML in flashed message ⌘

  • Can you flash a message with some HTML?

Exercise: Frozen Wiki ⌘

  • Create a controller function and a view that reads a markmin page from a file and renders it
    • Use the MARKMIN() function to create something printable from the MarkDown-like markup
    • Get the page name from the arguments, look under private/pages, maybe use gluon.fileutils.read_file()
    • Think about error handling
      • Just raise an HTTP(404) on any error
  • Add a sidebar with links to all pages, mark the current one
    • Hint: gluon.fileutils.listdir() may be useful

Template hierarchy ⌘

  • Two mechanisms: extend and include
  • Extend means "use that other file as a master"
    • Write {{ extend 'layout.html' }} in a view
    • You may want to set some variables before writing {{ extend 'layout.html' }}
      • Anything that you write before {{ extend 'layout.html' }} will be executed or printed in the beginning
      • Then, the interpreted contents of 'layout.html' will follow.
    • In layout.html, {{ include }} includes stuff that follows {{ extend 'layout.html' }} in the slave view
  • {{ include 'sidebar.html' }} means just that: include that other file
    • Indeed useful for sidebars

Blocks ⌘

  • May be more familiar to Django and Flask/Jinja2 users
  • In layout.html, define {{ block foo }}default content{{ end }}
  • In slave views, optionally redefine the block
    • The new content will replace the original
    • You can get the default content using the {{ super }} keyword

Database topics ⌘

  • Making SQL queries from Python
  • Ways to avoid SQL injection
  • DAL vs ORM
  • Defining tables
  • Selecting data
  • Joins
  • Inserting, updating and deleting data
  • Transactions

Making SQL queries from Python ⌘

  • DB API 2.0 specification
    • A standard which database-interfacing modules follow
    • You can connect to the database, issue queries, get results
    • There is a mechanism for using prepared queries and substitution of parameters
import sqlite3
db = sqlite3.connect('test.sqlite')
cur = db.cursor()
cur.execute('SELECT id, dname FROM dept WHERE loc = ?', ('Berlin',))
rows = list(cur)
db.commit()

SQL injection ⌘

  • A serious vulnerability
  • Appears when an application constructs SQL query dynamically by concatenating fixed strings with attacker-provided data
  • By using special characters (like ') inside their data, attacker can trick SQL database to interpret their data as additional SQL queries
    • Can bypass access restrictions, exfiltrate data, make unauthorized changes

How to avoid SQL injection ⌘

  • Answer from 1990s: forbid bad characters in user input
    • Bad: how would you register Catherine Anne O'Hara as a user?
  • Answer from 2000s: "escape" user data before concatenation so that special characters are properly interpreted as data, not as SQL syntax
    • Bad: too hard to keep track what's escaped and what's not
    • Too easy to forget to escape data
  • Answer from 2010s: don't build queries by concatenating fixed strings with user-provided data
    • "Prepared Statement" APIs exist that clearly separate SQL from external data in queries
      • Usually (but not always), they stay separate on the wire. If the database doesn't provide this option, the client library will escape and concatenate strings as necessary.

How to avoid SQL injection, once again ⌘

Don't:

cur.execute("SELECT id, dname FROM dept WHERE loc = '%s'" % (city,))

Do:

cur.execute('SELECT id, dname FROM dept WHERE loc = ?', (city,))

Or just let web2py generate queries for you.

Exercise: perform SQL injection ⌘

  • Which value of city will cause the bad query to read department with id = 1? The actual department name is not known to an attacker
    • Hint: one of the possible answers abuses the "OR" operator

DAL ⌘

  • Database Abstraction Layer
  • Different from a typical ORM
    • Does not have a separate class for a row of each table
    • Uses a single Row class
    • Does not use prepared statements
      • Performance implications ☹
  • Can be used standalone
from gluon.dal import DAL, Field
db = DAL('mysql://user:password@sql.example.com/db')
# You can have multiple database connections

Why we need DAL ⌘

  • SQLite on dev, PostgreSQL in production
    • Slight differences in SQL syntax
    • E.g. here is how to create a suitable id field in SQL:
      • SQLite: id INTEGER PRIMARY KEY (autoincrement is optional, and the AUTOINCREMENT keyword, if present, must be placed last)
      • MySQL: id INTEGER AUTO_INCREMENT PRIMARY KEY
      • PostgreSQL: id SERIAL PRIMARY KEY
  • Construct queries programmatically
    • Important if a user should be able to e.g. filter or sort records on multiple criteria
    • "IN" queries are just cumbersome with raw DB API
  • Access result rows in more object-oriented way
    • E.g. use dictionary-like objects with attribute names, not just tuples
  • Handle framework-specific objects such as uploads
  • Attach validators, interact with forms

Defining tables ⌘

  • SQL (for MySQL):
CREATE TABLE blog_post
    id INTEGER AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255),
    body TEXT;
  • DAL:
db.define_table('blog_post',
                 Field('title', 'string', length=255),
                 Field('body', 'text')
)
  • id is implied
  • After that, you'll be able to insert, delete and edit records via the "appadmin" controler
    • It asks for the admin password

Primitive data types ⌘

DAL SQL Notes
'string', length=??? VARCHAR(???) default length is 512
'boolean' usually CHAR(1) '0' or '1' is stored
'text' TEXT
'blob' BLOB stored base64-encoded even if the database can handle binary data
'integer', 'bigint' INTEGER Or a bigger database-specific type
'float', 'double' FLOAT, DOUBLE
'date', 'time' DATE, TIME No timezone information
'datetime' TIMESTAMP (DATETIME for MySQL) This is a DATE and a TIME together

Other data types ⌘

DAL SQL Notes
'json' TEXT Or a database-specific JSON type
'password' VARCHAR(???) Treated exactly as a string
'upload' VARCHAR(???) A filename is stored
'list:integer', 'list:string' TEXT Or a database-specific type. Think twice before denormalizing!

Uploads ⌘

  • There is a special "upload" field type
  • Has some keyword arguments
    • uploadfolder - where in the filesystem to place the uploaded file
      • It will be renamed for safety, but the original filename will be stored
    • autodelete - whether to delete the file together with the record (by default False)
    • uploadfield - it is possible to store files in the database, as blobs, all we need is the name of that blob field
    • authorize - callback to decide whether the user is authorized to download the file back

Foreign keys ⌘

  • SQL:
CREATE TABLE ...
   foo INTEGER REFERENCES bar(id);
  • DAL:
    Field('foo', 'references bar')
  • Always references id
  • May or may not be NULL

Constraints ⌘

  • There are several ways to specify constraints in the Field constructor
  • Keyword arguments
    • required=True forbids DAL to insert empty values
    • requires=... sets up validators for forms
      • No effect for values inserted not through forms
    • unique=True, notnull=True
      • Correspond to UNIQUE and NOT NULL SQL clauses
      • Enforced at database level
      • Multiple NULLs do not violate the UNIQUE constraint, except in MS SQL Server
    • There is a bug: using forms, you cannot insert NULLs into unique-but-nullable non-string non-reference (e.g. plain integer) fields

Reverse-engineering an existing database ⌘

  • Use scripts/extract_*_models.py
    • Does not work here for sqlite

Default validation ⌘

  • web2py inserts default validators for things like unique and notnull
  • You may still want to duplicate them in order to override the default error messages

On NULL values ⌘

  • By default, web2py does not prohibit NULLs in fields
  • It is easy to create a NULL in an integer field
    • Just don't enter anything
  • There can be a blank option in a field with a list of options
    • requires=IS_IN_SET(...)
    • Selecting the blank option inserts a NULL
  • There is some UI-related confusion between a NULL and an empty string in string field
    • Other web frameworks recommend making all string fields nonnull because of that
    • By default, web2py inserts an empty string
    • Add requires=IS_EMPTY_OR(IS_NOT_EMPTY()) if you want an empty input to become a NULL

Adding requirements ⌘

  • It is possible to list requirements in the field definition
  • It is also possible to add them dynamically
db.table.field.requires.append(...)

Formatting records ⌘

  • Add a represent=(lambda value, row: ...) to Field constructor arguments
    • This is completely optional
  • Add a format=labmda row: ... to Table constructor elements
    • As a Row is a Storable which is a subclass of a dict, strings with format expressions also work:
      • format="%(name)s (%(price)s EUR)"
    • Quirk: the default IS_IN_DB(...) validator is added to the "references" fields only if the referenced fields can be formatted
    • This also affects the automatic choice of widgets in SQLFORM

Relations ⌘

  • 1:1: don't do this, just put both sides into the same table
    • If you must (because of physical storage considerations), just use the same primary key and write helpers or joins to retrieve the other side
  • 1:many: add a non-NULL reference field to the "many" side that references the "1" side
  • 1:0-1: add a non-NULL unique reference field to the "0-1" side that references the "1" side
  • many:many: use an intermediate table that has two non-NULL fields that reference both sides

Normalization ⌘

  • Goal: make sure that the very structure of the database makes it impossible to have inconsistent data
    • If you don't update your data, denormalization may be OK
  • Traditional theory of relational databases doesn't really accept IDs
  • Always think "if there is no ID, what else could serve as a key?"
  • Useful rule of thumb, approximately equivalent to BCNF (also known as 3.5NF):
    • Each attribute must provide a fact about the key (1NF), the whole key (2NF), and nothing but the key (BCNF).
      • Both 2NF and 3NF are concerned equally with all candidate keys of a table and not just any one key.
      • So help me Codd!
  • Any use of "list:..." type breaks 1NF.

Exercise: shop model ⌘

  • You have a shop that sells goods from several categories and delivers them to a customer-supplied address
  • We need to keep prices, product images, and orders (including past orders)
  • Create database tables for it using the DAL syntax
  • Play with it using appadmin

Indexes ⌘

  • The book documents them as not natively available
  • Work in progress? code landed on 2016-05-26 and is not part of any stable release
    • Syntax: db.mytable.create_index('index_name', db.mytable.myfield_a, db.mytable.myfield_b)
    • For unique indexes, pass unique=True
    • Unit tests exist, too!

Migrations ⌘

  • Database needs to evolve due to changing business requirements
    • Schema migrations
    • Data migrations
  • web2py has fully-automatic, non-configurable, schema migrations and no data migrations
    • Migrations work by comparing the models with *.table files that were created during the previous migration
    • Applied on the first request that notices the mismatch

Migrations in other ORMs ⌘

  • Django: migrations (both schema and data) are first-class citizens
    • Schema migrations are auto-generated, but can be edited
  • SQLAlchemy: no built-in support
    • Frequently used with Alembic, which has good documentation for schema migrations
      • Schema migrations are auto-generated, but can be edited
      • Data migrations are possible but only documented in external blogs
  • Peewee
    • Some built-in schema migrations but no built-in tracking of what has been applied
    • No clear winner among tracker tools
    • Interesting idea: fully automated schema migrations that introspect the current schema and diff it against the models

Inserting rows ⌘

  • Fire and forget:
    • row_id = db.tbl.insert(foo='bar', val=42)
    • All fields not explicitly mentioned will get the default values
  • You can also create a dictionary, populate it as needed, and pass it:
    • row_id = db.tbl.insert(**mydict)
  • There is no way to insert anything except VALUES(...) or DEFAULT VALUES
    • I.e. no INSERT INTO tbl SELECT ...
  • Don't forget to call db.commit()
    • This is automatically called at the end of each web request, but not called if you are using an interactive shell or a script
    • An alternative is to call db.rollback()

Simple selects ⌘

  • If there is a table "category", these statements work:
row_id = db.category.insert(name="Motherboards")   # let's suppose it returns 42

row = db.category(42)                    # select by id
row = db.category(name="Motherboards")   # select by any attribute

Exercise: try to break inserts and selects ⌘

  • From the command line, connect to sqlite://demo.db or just the default sqlite file
    • You will need a databases folder in your application
  • Figure out what happens if you:
    • ... violate the UNIQUE constraint
    • ... violate the NOT NULL constraint
    • ... break the required=True constraint
    • ... break something in requires=...
    • ... attempt to select something that doesn't exist
    • ... attempt to select something ambiguous

How other ORMs treat database errors ⌘

  • You have observed a bad thing: a database-specific IntegrityError exception
    • Impossible to catch without writing database-specific code or catching all exceptions
      • PyLint will hate you if you catch all exceptions
  • Other ORMs (Django ORM, SQLAlchemy, Peewee) automatically convert such exceptions into their own non-database-specific class

More complex selects ⌘

  • Build expressions, select the results
rows = db((db.category.name == "Motherboards") | (db.category.name == "Video Cards")).select()
  • Watch operator priority! Parentheses are strictly necessary here.
  • Comparisons with None work and are transformed into IS NULL and IS NOT NULL appropriately.
  • db.category transforms into category.id IS NOT NULL (which is quite pointless)

How queries work in DAL ⌘

  • Relevant classes: Expression, Query
    • Expression represents something that can be SELECTed or put on the right hand side of the "=" in UPDATE
    • Query represent something that can be put in a WHERE clause
  • A Field (like db.category.name) is already an Expression due to inheritance
  • It is possible to:
    • Combine Expressions using arithmetical operators
    • Compare with other Expressions and constants to form Queries
    • Perform logical operations against Queries
    • Extract parts (substrings, components of the date)
  • The resulting Expression contains nothing except a reference to the db connection, first and second operands (as Expressions), and the required operation
    • I.e. Expressions and Queries form a tree that explains how to get the result
    • Expression does not contain its result!
      • db.category.name == "Motherboards" is not a boolean! It's a Query

Why such syntax ⌘

  • The ~, & and | operators are used instead of traditional not, and and or operators because the traditional forms cannot be overrideen from Python

How queries work in DAL, continued ⌘

  • DAL(query) results in a Set
    • This binds a WHERE clause and a database connection
    • On a Set, you can call select(), update() or delete()

Preparations for exercises, part 1 ⌘

Create an sqlite database by copy-pasting these SQL statements

$ sqlite3 demo.sqlite
CREATE TABLE dept (id INTEGER PRIMARY KEY, dname VARCHAR(255) NOT NULL, loc VARCHAR(255) NOT NULL);
INSERT INTO dept VALUES(10,'ACCOUNTING','NEW YORK');
INSERT INTO dept VALUES(20,'RESEARCH','LONDON');
INSERT INTO dept VALUES(30,'SALES','PARIS');
INSERT INTO dept VALUES(40,'OPERATIONS','BERLIN');
CREATE TABLE salgrade(id INTEGER PRIMARY KEY, losal DECIMAL(10,2) NOT NULL, hisal DECIMAL(10,2) NOT NULL);
INSERT INTO salgrade VALUES(1,700,1200);
INSERT INTO salgrade VALUES(2,1201,1400);
INSERT INTO salgrade VALUES(3,1401,2000);
INSERT INTO salgrade VALUES(4,2001,3000);
INSERT INTO salgrade VALUES(5,3001,9999);

Preparations for exercises, part 2 ⌘

CREATE TABLE emp (id INTEGER PRIMARY KEY, ename VARCHAR(255) NOT NULL,
  job VARCHAR(255) NOT NULL, mgr INTEGER REFERENCES emp(id),
  hiredate DATE NOT NULL, sal DECIMAL(10,2) NOT NULL,
  comm DECIMAL(10,2), deptno INTEGER NOT NULL REFERENCES dept(id));
INSERT INTO emp VALUES(7839,'BUSH','PRESIDENT',NULL,'1981-11-17',5000,NULL,10);
INSERT INTO emp VALUES(7698,'BLAIR','MANAGER',7839,'1981-05-01',2850,NULL,30);
INSERT INTO emp VALUES(7566,'PUTIN','MANAGER',7839,'1981-04-02',2975,NULL,20);
INSERT INTO emp VALUES(7902,'TOOSK','ANALYST',7566,'1981-12-03',3000,NULL,20);
INSERT INTO emp VALUES(7369,'THATCHER','CLERK',7902,'1980-12-17',800,NULL,20);
INSERT INTO emp VALUES(7499,'BAROSSO','SALESMAN',7698,'1981-02-20',1600,300,30);
INSERT INTO emp VALUES(7521,'WALTON','SALESMAN',7698,'1981-02-22',1250,500,30);
INSERT INTO emp VALUES(7654,'CHIRACK','SALESMAN',7698,'1981-09-28',1250,1400,30);
INSERT INTO emp VALUES(7782,'MERKEL','MANAGER',7839,'1981-06-09',2450,NULL,10);
INSERT INTO emp VALUES(7788,'CARNEGIE','ANALYST',7566,'1982-12-09',3000,NULL,20);
INSERT INTO emp VALUES(7844,'GATES','SALESMAN',7698,'1981-09-08',1500,0,30);
INSERT INTO emp VALUES(7876,'FORD','CLERK',7788,'1983-01-12',1100,NULL,20);
INSERT INTO emp VALUES(7900,'BUFFETT','CLERK',7698,'1981-12-03',950,NULL,30);
INSERT INTO emp VALUES(7934,'ELISON','CLERK',7782,'1982-01-23',1300,NULL,10);
.exit

Copy the resulting demo.sqlite file into applications/welcome/databases/demo.sqlite

Exercises from the SQL course ⌘

  • Run an interactive shell (best done with IPython installed):
$ python2 web2py.py -S welcome
  • Connect to the database
  • Define tables
    • Use migrate=False argument for define_table to avoid errors due to tables that already exist
  • Go through SQL statements and exercises from the SQL Programming training, produce the same result in DAL
    • Consult the next few slides for help with DAL syntax
    • Some exercises are not doable with DAL, please identify them and explain what's wrong with them
  • It's a good idea to use as_list() on select() results to see them
db(db.salgrade).select().as_list()
  • Use db._lastsql to look at the last executed SQL statement
  • Or, use _select() instead of select() to preview the SQL without executing it


SQL features not available natively ⌘

  • Typecasting to string
  • Concatenation of non-string expressions with strings
    • To concatenate strings, use +
    • Anyway you can select parts separately and concatenate them in Python
  • Concatenation where the first operand is a fixed string
  • Subtraction where the first operand is a fixed number
  • Date and time manipuation other than .year(), .month() and .day()
    • Do it in Python
  • BETWEEN
    • Reformulate in terms of two comparisons
  • Selecting COUNT(*) together with something else during aggregation
    • Use COUNT(id) for the same result
  • UNION, INTERSECT, EXCEPT
    • Python-level alternatives exist as |, & operations on Rows
  • Subqueries
  • The list above is incomplete!

More DAL syntax ⌘

  • SELECT a, b, c FROM tbl
    • db(db.tbl).select(db.tbl.a, db.tbl.b, db.tbl.c)
  • SELECT DISTINCT a, b FROM tbl
    • db(db.tbl).select(db.tbl.a, db.tbl.b, distinct=True)
  • SELECT a + b AS x FROM tbl
    • db(db.tbl).select((db.tbl.a + db.tbl.b).with_alias('x'))
    • with_alias() can also be applied to tables - important for self-joins
  • IFNULL: rethink in terms of CASE or COALESCE
    • E.g. to convert them to 0: (db.tbl.a != None).case(db.tbl.a, 0)
    • Simpler solution: db.tbl.a.coalesce(0)
  • ... WHERE a IN (10, 20)
    • db.tbl.a.belongs((10, 20))
  • ... WHERE a LIKE '%A%'
    • db.tbl.a.like('%A%'), or db.tbl.a.contains('A')
      • .startswith(...), .endswith(...) are also available

Sorting and aggregation ⌘

  • SELECT ... ORDER BY a ASC, b DESC
    • ...select(..., orderby=db.tbl.a | ~db.tbl.b)
  • AVG(a) and other aggregate functions
    • db.tbl.a.avg() and similar
  • COUNT(a) (counts non-NULL values)
    • db.tbl.a.count()
  • COUNT(DISTINCT a) (counts distinct non-NULL values)
    • db.tbl.a.count(distinct=True)
  • SELECT ... GROUP BY a HAVING a > 100
    • ...select(..., groupby=db.tbl.a, having=db.tbl.a > 100)
    • HAVING omits unwanted groups, just like WHERE omits unwanted records

If the whole result is not interesting... ⌘

  • Get the first row of the result, or None
    • select(...).first()
      • last() also exists
      • Think about sorting
      • If there are multiple rows, this retrieves them all and throws away everything except the first
    • Other ORMs allow special syntax for you to say "this query must return exactly one result, raise an exception if this is not the case"
  • Only count results, don't select
    • Use .count() instead of .select()
  • Check whether the result would be non-empty
    • Use .exists() instead of .select()
  • Implement pagination
    • select(..., limitby=(a, b))
      • Asks the database to find b first rows, discard the first a, and send the result to web2py

Selecting from multiple tables ⌘

  • It just works
db()._select(db.emp.ename, db.dept.dname)
SELECT "emp"."ename", "dept"."dname" FROM "emp", "dept";
  • This yields a cartesian product
  • Add conditions to make the result useful
db(db.emp.deptno == db.dept.id)._select(db.emp.ename, db.dept.dname)
SELECT "emp"."ename", "dept"."dname" FROM "dept", "emp" WHERE ("emp"."deptno" = "dept"."id");

Explicit joins ⌘

  • A join condition is a parameter to select
db(where).select(..., join=db.tbl2.on(query))
  • This is translated to:
SELECT ... FROM "tbl1" JOIN "tbl2" ON <query> WHERE <where>
  • Which is equivalent to:
SELECT ... FROM "tbl1", "tbl2" WHERE <query> AND <where>
  • You can provide a list as a join=... argument to produce multiple joins
  • Similar syntax works for left joins: select(..., left=db.tbl2.on(query))
    • Left joins also provide rows from the left table that don't have a pair from the right table (columns from the right table become NULLs)
      • Use WHERE tbl2.id IS NULL to select only these orphaned rows
  • Table on the left to JOIN or LEFT JOIN is determined by magic code in _build_joins_for_select()
    • It is, roughly, a cross join of all tables mentioned anywhere in the query, fields, or join expressions, but not mentioned just before .on(...)

Deleting rows ⌘

  • If you have a row: row.delete_record()
  • If you have a query: db(query).delete()
    • No tunable parameters
    • Cannot delete from join
  • There are ._before_delete and ._after_delete callbacks that can be defined on tables

Delete and references ⌘

  • What happens if you delete some row that has rows in other tables that reference it?
    • This depends on the ON DELETE parameter that you pass in SQL when creating the referencing table
      • Exposed as the ondelete=... keyword argument in the Field constructor
    • The default in web2py is CASCADE, which means "delete every reference that would be broken"
    • Other options: SET NULL (obvious), RESTRICT (fail immediately), NO ACTION (fail at the end of the transaction if anything is still broken)
      • The _before_delete and _after_delete callbacks are not called for rows deleted because of the cascade

Updating rows ⌘

  • If you have a Row:
row.foo = 42
row.bar = "what?"
row.update_record()
  • Alternatively:
row.update_record(foo=42, bar="what?")
  • Bulk or atomic update without Row objects:
db(...).update(foo=42, bar="what?")

update_or_insert ⌘

  • update_or_insert: update an existing record, or insert a new one if it doesn't exist
    • def update_or_insert(self, _key=DEFAULT, **values): ...
    • _key (the first positional parameter) can be a dict or a Query, or omitted altogether
    • _key is used to find the record that has to be updated
      • If omitted, **values are used both for searching and for insertion
    • returns the id of the inserted record, or None if there was only an update
  • The method is racy
    • look at the implementation
    • think what happens if a new record is inserted in a different thread at the worst possible time
      • also think about unique keys

Running custom SQL ⌘

  • db.executesql("SELECT foo FROM bar WHERE id = 123")
  • Supports placeholders but does not document their exact syntax
    • Actually, it depends on the underlying database
  • Can return result as rows, as a list of tuples, or as dicts

Transaction isolation levels ⌘

  • SET TRANSACTION ISOLATION LEVEL
    • READ UNCOMMITTED (allows reading of uncommitted values)
    • READ COMMITTED (still allows rereading the value and finding that it has changed) (default in PostgreSQL)
    • REPEATABLE READ (but still over a non-repeatable set of rows) (default in MySQL)
    • SERIALIZABLE (no anomalies) (default in sqlite)
  • Because of these anomalies, sometimes it is necessary to tell the database to use a write-lock even if only a SELECT is done
    • SQL: SELECT ... WHERE ... FOR UPDATE
    • DAL: db(...).select(..., for_update=True)
  • Beware: at transaction isolation levels REPEATABLE READ and SERIALIZABLE, the database can throw a "serialization failure" error at any time there is a conflicting write
    • web2py does not handle this at all


Atomic updates ⌘

  • Useful for handling votes
  • Watch the generated SQL
  • It may be wrong to get the current number of votes, add 1 in Python, and write back
  • Best solution: UPDATE voting SET votes = votes + 1 WHERE ...
    • Can this be expressed in DAL?

Automatic referenced attribute access ⌘

  • This works on reference fields
dname = db.emp(ename="BUSH").deptno.dname
  • Automatically issues a second SELECT to fetch the department name
  • This pattern may be OK if attribute access is not in a loop
    • Performance problems otherwise
    • Better reformulate in terms of a join
  • Other ORMs have things like select_related to formulate the JOIN automatically

Authentication, Authorization & Access Control ⌘

  • Based on users and groups
    • Users belong to groups
      • You can manage them using appadmin
    • Groups can have permissions to create, read, update, delete records in tables
      • Enforced by the deprecated "CRUD", or by auth.accessible_query()
    • You can create custom permissions
    • There are decorators to check those permissions

Built-in user-related functionality ⌘

  • Registration
    • With or without "email as username"
    • With various styles of email confirmation
    • With or without CAPTCHA (Recaptcha or custom)
  • Sending invitations (aka bulk registration) - disabled by default
  • Login, logout
    • Optinal two-factor authentication
  • Editing user profile
    • With arbitrary extra fields
  • Resetting password
  • Retrieving login by email address
  • Impersonation of any user by admins
    • Actually has_permission('impersonate', 'auth_user', user_id)

Default auth tables schema ⌘

Schema auth.png

Authentication controller and view ⌘

  • In a model:
from gluon.tools import Auth
auth = Auth(db, host_names=myconf.get('host.names'), ... other settings ...)
auth.define_tables(... various settings ...)
auth.settings.foo = 'bar'
  • auth.settings defaults come from gluon.tools.Auth.default_settings
    • Disable what's unneeded in auth.settings.actions_disabled
    • You can also customize all messages
  • In a controller:
def user():
    return dict(form=auth())
  • See the comments in the welcome app about available subpaths
  • The template lives in templates/user.html
    • Inspect request.args(0) to see whether this is registration, login or something else

Adding new fields to auth tables ⌘

  • If you want a proper username (instead of "email is username"):
    • auth.define_tables(username=True)
  • If you want completely custom fields:
auth.settings.extra_fields['auth_user']= [
  Field(...),
  Field(...),
  Field(...)]
  • New custom fields will also be visible on the Profile page

Tracking who changed the record ⌘

  • There is auth.signature
    • Just a canned collection of is_active, created_on, created_by, modified_on, modified_by fields with suitable defaults
    • Add it to your table as you would add a Field
    • No further code is needed due to automatic default and update keyword arguments on the added fields

Tracking old versions of rows ⌘

  • Only one of:
    • form.process(onaccept=auth.archive) if you want to cover only changes done via one specific form
    • auth.enable_record_versioning(tables) if you want to cover all changes except ON DELETE CASCADE
  • Only works for tables with is_active and modified_on
  • Old versions of tbl rows are stored in tbl_archive
    • If you need to access them, define the model:
db.define_table('tbl_archive',
   Field('current_record', 'reference tbl'),
   db.tbl)
  • Rows are never deleted from tbl, but marked as is_active=False
  • All queries are transparently modified to exclude such rows

Access Control passive security primitives ⌘

auth.user                # a copy of the relevant row drom db.auth_user, or None
auth.is_logged_in()      # returns a boolean
auth.is_impersonating()  # see if the current user is a mind-controlled zombie
auth.has_membership(group_id=None, user_id=None, role=None)  # logs all calls!!!
auth.has_permission(...) # checks record-level permission
auth.accessible_query(name, table)  # a DAL query that filters out inaccessible records
  • Note: auth.user.id will crash for the not-logged-in case. Use auth.user_id if you want to get a None instead

Access Control active security primitives ⌘

auth.logout_bare()  # programmatically log out
auth.login_user(user_id)  # programmatically log in, pre-authenticated
auth.login_bare(username, password)  # check password and maybe login

auth.register_bare(...)  # programmatically register a user
auth.add_group(role, description)
auth.del_group(group_id)
auth.add_membership(user_id=..., group_id=...)
auth.del_membership(user_id=..., group_id=...)
auth.add_permission(...)
auth.del_permission(...)

Access Control decorators ⌘

  • Available decorators
@auth.requires_login()
@auth.requires_membership('some_group')
@auth.requires_permission('permission', 'subject')
@auth.requires(boolean_expression)
  • Put them on your controller functions or on any auxiliary functions that you want to protect
    • Don't use more than one decorator per function - use @auth.requires instead
  • On failure, they apply the first suitable action from the list:
    • respond with 401 unauthorized to Ajax requests
    • redirect to URL passed in the otherwise=... keyword parameter if the parameter exists
    • respond with 403 Forbidden to REST requests
    • redirect to the login form otherwise

Protecting the whole controller ⌘

  • Add some code at the top of it, outside of any function
    • See a working example in appadmin.py

Password security basics ⌘

  • Knowledge of a password gives you access to some protected resource
  • There are crooks that want others' passwords
  • Threat model:
    • Online guessing: enforce restrictions (minumum length, ...) in validators
    • Network sniffers: enforce https
    • Keyloggers: use two-factor authentication
    • Stolen database: store passwords securely, to withstand offline attacks
      • Not as plain text (obviously)
      • Not as mere hashes (due to dictionary attacks)
        • Hackers have hashes of all words and will just search for them
        • This also gives out whether two passwords are identical
      • Not as simple salted hashes (still easy to brute-force on a GPU for human-friendly password lengths)
        • Brute-forcing of an 8-character password (48-bit keyspace) hashed via SHA1 with a salt takes 5 days on a single GPU
      • The current state of the art is PBKDF2, and web2py uses it

PBKDF2 ⌘

  • Password-based key derivation function
  • RFC2898
  • Irreversible
  • Tunable complexity (via the number of hash iterations)
    • Makes password checking (and therefore offline cracking) deliberately slow
      • Now your login form is a DoS attack vector
      • Now any API that accepts authentication via the user's login and password is also a problem!
      • Solution: don't expose APIs that authenticate via the user's login and password

Interoperability with other web frameworks ⌘

  • Various web frameworks store PBKDF2 salted hashes differently
    • Django: pbkdf2_sha256$30000$DZRxycy7VktY$V9LL/vMnUyn7OeKU4h4dpvLvWPTNOzl+NsPw95Fm9eQ=
      • algorithm $ number of iterations $ salt $ base64(hash)
      • The length of the PBKDF2 hash used by Django is the same as the length of the native hash output
    • Web2py: pbkdf2(1000,20,sha512)$bceea2673a6a024e$7a3fe52afaf373bc5a32aa60d38687845e061a18
      • pbkdf2(iterations, length, algorithm) $ salt $ hex(hash)
  • In both cases, the password is crackme!

Exercise with cryptography ⌘

  • Can you take apart the Django-provided PBKDF2 hash and put it into web2py auth_user table so that web2py recognizes it?
    • pbkdf2_sha256$30000$DZRxycy7VktY$V9LL/vMnUyn7OeKU4h4dpvLvWPTNOzl+NsPw95Fm9eQ=
  • Why is the reverse impossible out of the box?

i18n in web2py ⌘

  • The T() function translates text
    • The language is taken from HTTP headers sent by the client
    • Use T.force('it') to force Italian
    • /app/lang/controller/function also works if languages parameter is set properly in routes.py
  • The return value is not always a string
    • By default, it is a lazy translation object
    • It remembers "OK, we need to translate this string into this language"
    • Translation happens when a string value is needed
      • This involves a cache lookup, then a mechanism to locate the appropriate *.py file with translations, and import the dict from it

Translating text with parameters ⌘

  • Don't: T("Increase salary of ") + str(employee)
    • Translators may need to insert the employee name somewhere in the middle!
  • Do: T("Increase salary of %(employee)s", symbols=dict(employee=employee))
    • The "%s" style is supported, too
      • Needs a tuple or a list as symbols
      • If there is only one substitution parameter, provide it directly: T("Increase salary of %s", employee)

Pluralization ⌘

  • Just put the word that needs pluralization in a %%{marker}
    • web2py will look for the first parameter in symbols and use it to correctly pluralize the word "marker"
    • Use %%{marker[1]} or %%{marker(markers)} if you need to pluralize according to symbols[1] or symbols['markers']
  • For languages with regular rules for making plural forms, web2py does it automatically
  • For other languages, you have to maintain the explicit dictionary of plural forms in languages/plural-*.py

Translation of blocks of HTML ⌘

  • This happens in views
<strong>{{=T("There are no modules")}}</strong>
  • A problem arises if the translatable string includes some HTML tags (e.g. a <span> to highlight something, or a link)
    • web2py authors mostly manage to avoid it, but sometimes they don't:
    <div class="editor-bar-bottom" style="margin-top:9px;">
        <a class="editbutton btn" href="{{=URL(...)}}" id="restore">{{=T('restore')}}</a>
        {{=T('currently saved or')}}
        <a class="editbutton btn" href="{{=URL(...)}}" id="revert">{{=T('revert')}}</a>
        {{=T('to  previous version.')}}
    </div>
  • No good solution here
  • Discussion of the same issue in the context of Django

Why an English "translation" is needed ⌘

  • web2py does not assume that the source text is in English
    • It is in unknown language, so to say
    • What if the browser says Accept-Language: en-US,en;q=0.8,ru;q=0.6?
      • If a Russian translation exists but English doesn't, Russian will be returned

Translating identical strings differently ⌘

  • Sometimes useful
  • Include context into the original string
    • Here is how: T("Phrase ## context")

i18n in web2py vs other frameworks ⌘

  • web2py does not use gettext due to concerns about thread-safety
    • "We cannot use gettext because it translates/pluralizes assuming the entire process needs the same language."
    • Misguided? Or at least very strange for a modern framework!
      • Somehow, this is not a problem for Django and Pyramid
      • Flask is often used with Babel which has its own thread-safe parsers for gettext-compatible *.po and *.mo files
  • There is also a concern about gettext being too difficult to set up on Windows
    • Irrelevant now, because there is Babel
  • There are tools to convert between web2py *.py translation files and gettext *.po files
    • This is useful if you want to set up something like pootle for your translators or use Transiflex instead of giving them access to the admin interface

Form topics ⌘

  • HTML helpers
  • Validators
  • Forms
  • SQLFORM

HTML helper classes ⌘

  • Generate tags
  • Have the .xml() method
  • Full list is in gluon.html module
  • No HTML5-specific tags, but this is not a big problem
  • General pattern:
    • Call the constructor
    • Pass inner text and nested tags as positional arguments
      • Text will be escaped properly
    • Pass attributes as keyword arguments with a leading underscore

Special helper classes ⌘

  • CAT: not a tag. Just concatenates strings and inner tags.
  • A: way too smart. Can generate not only regular links, but also AJAX!
  • INPUT, TEXTAREA, SELECT: not only represent themselves, but also can react to user input
  • FORM: can validate itself

Exercise ⌘

  • Build a helper to display a slider from jQuery UI
    • Don't care about validation or getting the data back yet

Special methods of INPUTs ⌘

  • INPUT, TEXTAREA, SELECT constructors have keyword arguments
    • value: initial value
    • requires: None, a validator, or a list of validators

Validators ⌘

  • Tasks:
    • Convert valid input from HTML strings to appropriate Python data types
      • In other frameworks, can easily return ORM objects!
    • Provide error messages for invalid inputs
  • Callables that take a string and return a pair: (converted_value, error)
    • If converted_value is significantly different from the input (e.g. not of the same type), think about providing the formatter function that performs the required conversion back to string
  • Factories for them are in gluon.validators
    • The appropriate error messages can often be supplied as an argument, error_message='...'
      • The error message should not contain a call to T(), web2py will call it for us
      • Good - you can always override the framework-default error messages, and translate them independently from the framework
  • You can write your own

How to use validators ⌘

  • When you first call the validator class (e.g. IS_MATCH), you create an instance of the validator and thus store all tunable parameters
  • The result is also callable, and the call performs validation of the argument according to the stored parameters
  • With validator instances, you can do this:
    • Pass as requires=... keyword argument when creating a Field in a table
    • Pass as requires=... keyword argument when creating an INPUT in a form
    • Directly assign to db.tbl.field.requires
      • In all cases, a list of validators is acceptable, and each validator transforms the input for the next one

Available validators ⌘

  • See gluon.validators
  • The most important ones (that you want to insert manually) are:
    • IS_NOT_EMPTY() - self-obvious
    • CLEANUP() - removes blanks from the edges of the string, removes bad characters (those matched by the supplied regular expression)
    • IS_INT_IN_RANGE - for enforcing minimum and maximum, and disallowing non-numeric input
    • IS_MATCH() - verifies the input according to a regular expression
    • IS_LENGTH() - sets minimum and maximum length of a field
      • Or a size of the uploaded file
    • IS_IMAGE() - implements some crude checks on uploaded files, but doesn't do full decoding
    • IS_IN_SET() - useful when there are several valid options
      • When used on a field, makes the default widget a select
      • There is no IS_NOT_IN_SET()
    • IS_IN_DB() - useful for fields that reference something
      • Again, changes the default widget to a select
    • IS_NOT_IN_DB() - for avoiding duplicates

Validators that result in a select ⌘

  • IS_IN_SET(), IS_IN_DB()
    • They result in a select only when they are the first in the chain
  • HTML looks like this
<select>
  <option value="3">Business</option>
  <option selected="selected" value="2">Premium</option>
</select>
  • The value is sent to the server and validated, but we also need a human-readable representation
    • That's why the labels=... keyword argument for IS_IN_SET()
      • Can be a list of strings
    • That's why the label=... keyword argument for IS_IN_DB()
      • Can be a format string, or a lambda, or a field
  • There is also a zero='' keyword argument that adds the invalid "nothing selected" option
    • Pass a None to get rid of it

Limiting available choices for IS_IN_DB() ⌘

  • Sometimes the value should not just be in DB, but be in certain query output
    • Here is what to do: requires=IS_IN_DB(db(query), db.mytable.myfield)

Exercise ⌘

  • There is no IS_NOT_IN_SET()
    • Implement it!

Exercise ⌘

  • Write a validator for selecting some pages in the print dialog
    • Play with Chrome's dialog to gather the idea what should be valid and what shouldn't
    • The returned data should be a list of integers representing the page numbers
  • Test it

Special methods of FORMs ⌘

  • form.accepts(request.vars, ...)
    • The underlying primitive for other methods
    • Passive by default for FORM
    • Returns False if something is invalid or the form has not been submitted yet
    • Modifies the form if there are errors unless you pass hideerror=True as a keyword argument
    • Modifies the form anyway to add hidden _formkey and _formname inputs
      • That's CSRF and double-submission protection
  • form.validate()
    • Tests whether the form accepts data, then performs actions specified in onsuccess, onfailure or onchange (SQLFORM only) keyword arguments
      • Thus active!
    • Handy for flashing messages or redirecting
  • form.process()
    • Same as form.validate() but returns the form
    • Thus, also active

Whole-form validation ⌘

  • form.accepts() validates fields one-by-one
    • I.e. no whole-form validation, no validation of inter-field consistency
    • Actualy, even no concept of errors that are caused by the form as a whole and not one of its fields
  • Solution: onvalidate keyword argument
    • Also abused for external actions like sending email
    • Can be either a callable, or a list of callables, or a dict
      • The dict has to have only onsuccess, onfailure and onchange keys
        • Their values must be callables or lists of callables
    • The callables will be called after validation, with the form itself as an argument
      • A chance to add to form.errors, this will automatically mark form as invalid

Special properties of FORMs ⌘

  • form.vars
    • Validated and converted values
  • form.errors
    • Storage of errors: keys are input names, values are error messages
    • Empty if there are no errors
  • form.accepted
    • Just a boolean with obvious meaning

CSRF ⌘

  • Cross-Site Request Forgery
    • A hacker makes a form on his website
    • The button says "View my photos"
    • The form actually submits to your bank (in hope that you are logged in) and asks it to transfer $100 to the hacker
  • web2py protects against CSRF if you use FORMs and call .accepts()
    • The hacker has no access to your bank's session and thus cannot guess the correct _formkey
    • The bank will reject forms with incorrect or missing _formkey even if you are logged in

Displaying forms ⌘

  • Simplest possible method: {{=form}}
    • Requires putting LABELs in the form
  • You can write the HTML yourself, but then don't forget {{=form.hidden_fields()}}

Displaying errors ⌘

  • {{=form}} will do this for you
    • The error message will be in a <div class="errorwrapper><div class="error" id="$name__error">...</div></div>, just after the input
    • The invalidinput class will be added to the input itself
  • Or, you can do it yourself as you want, but then pass hideerror=True to form.accepts()
    • Use form.errors

More than one form per page ⌘

  • Possible!
    • Call form.accepts() for each form
    • Provide a unique formname='...' keyword argument in each call
    • This will add a hidden _formname input, and use it to see whether this is the form that the user submitted

SQLFORM ⌘

  • A way to display and validate a model-based form
  • To create a form suitable for inserting a new record into table tbl:
    • form = SQLFORM(db.tbl)
  • To create a form suitable for updating an existing record:
    • form = SQLFORM(db.tbl, record)
    • Add deletable=True to show a "Check to delete" checkbox
    • Add readonly=True to make the form read-only
  • Validators from the table definition apply

Processing SQLFORMs ⌘

  • FORM.accepts() is passive if there is no onvalidate
  • SQLFORM.accepts() and SQLFORM.process() are active: they do database I/O!
    • SQLFORM.validate() is still passive
      • That's a good thing if you need some post-processing and don't want to use onvalidate

Widgets ⌘

  • SQLFORM decides what kind of input tag to produce for each field
    • It looks at the field type and validators
    • It is possible to override the decision by adding a keyword argument, widget=..., to the field definition
    • db.tbl.field.widget = ... also works
  • Available widgets: see gluon.sqlhtml
    • Use as: gluon.sqlhtml.RadioWidget.widget

Dealing with uploads ⌘

  • web2py will display upload fields as links or inline images, depending on the extension
    • Links need to point to a controller that does the appropriate security checks and gives the content of the file
@cache.action()
def download():
    """
    allows downloading of uploaded files
    http://..../[app]/default/download/[filename]
    """
    return response.download(request, db)
  • By default, expects the filename be of the form: table.field.XXXXXX.ext
    • Parses the filename out from request.args[-1]
    • Creates a ticket if a user tries to download auth_user.email.bob@example.com.jpg
      • BUG

How widgets work ⌘

  • The widget function takes the database field, the value to display, and additional HTML attributes
  • Produces some HTML helpers

Limiting the available fields ⌘

  • Forgetting to limit the fields of autogenerated forms is a cause of many security vulnerabilities
    • Think about someone setting their own is_manager status to True
  • In field definition:
    • writable=False hides the field from create forms (update forms still show it as read-only)
    • readable=False hides the field from read-only forms, too
    • These properties can be assigned post-factum in controllers
  • In the SQLFORM call:
    • Use the fields=[...] argument

Adding extra fields ⌘

  • Pass them in the extra_fields keyword argument of SQLFORM constructor
    • They can have validators, as usual
    • You can take action upon the values in the onvalidate callback

ID tampering ⌘

  • Display an SQLFORM
  • F12 (developer tools), modify the id, attempt to submit
    • Result: ticket!
      • From a sysadmin viewpoint, this is spam. There is no fixable bug in the application, yet there is an unactionable ticket!
    • Other web frameworks have no protection against id tampering
      • Either the modified request is valid, or it will be rejected by some validator

Optimistic offline locking ⌘

  • Display an SQLFORM for editing some record
  • Call form.process(detect_record_change=True)
    • ...which is not the default
  • Load the form in two tabs
  • Try to submit both
    • The second submission won't work
    • Detection is based on MD5 sum of concatenation of all fields of the record
    • Message can be customized through message_onchange keyword argument
    • No built-in equivalent in other frameworks
      • Django: see 1, 2
        • Both based on the explicit version field
      • Flask/WTForms: could not find anything

Exercise: a real-world weird registration form ⌘

  • This comes from SafeDNS
  • They have some "billing plans"
    • A billing plan is pre-defined set of available functionality with a known price
    • Each user has a billing plan and, for non-free plans, date when the user's subscription expires (unless extended by paying money)
  • During registration, you choose your desired plan and get 15 days of subscription for free
  • It is possible to apply a promotional code during registration
    • In this case, the selected billing plan is ignored
    • The set of available promotional codes is managed by site managers
    • Each promotional code has an associated billing plan to give to the user, and the number of days to give him/her for free instead of the default 15
    • There are some billing plans that are not normally advertised, but available only via a promotional code
  • And now you have to implement user registration with all these weird requirements

Some hints ⌘

  • Look into gluon.tools.Auth.register()
    • It builds and processes the form
    • The exercise is mostly about not fighting too much with its logic
  • There is no way to add extra_fields to the form
    • So we need some other solution
  • The billing plan should be settable in the registration form but not elsewhere
    • How would our application know that it is going to show a registration form?
  • Find out how to attach onvalidation to the registration form
    • Set the expiration date there

Other SQLFORM-related constructions ⌘

  • Want a simple confirmation form?
    • There is FORM.confirm('Are you sure?')
  • Want some validated inputs not related to your database, but FORM is too low-level?
    • Use SQLFORM.factory(), supply fields as arguments
  • Want a table, not a form?
    • There is SQLTABLE
  • Want something smart, with Create, Edit and Delete buttons, and pagination?
    • Use SQLFORM.grid()

Client-side topics ⌘

  • Including CSS and JS
  • AJAX in web2py
  • Form styling
  • Autocomplete

Including CSS and JS ⌘

  • response.files.append(URL('static', 'something.js'))
  • In the template, call response.include_files()
    • So that they can be placed in the appropriate place in the document (usually the head)

Versioning static files ⌘

  • URL('static', 'something.js') does not necessarily return "/static/something.js"
  • Client-side caching is an important loading-time optimization
  • Client is told to cache all JS and CS files by setting the Expires: header to point far into the future
    • Problem: what if the file changes?
    • Solution: don't reuse the URL
    • Include some version number in it
    • web2py does this for you if you set request.static_version
      • Manually in models/db.py
      • Easy to forget
        • Solution: don't use web2py's static versioning
        • Your frontend web developer probably already has something to offer

Static asset optimization ⌘

  • It's inefficient to load many long JS/CSS files
    • Especially when @import is used in CSS, because it breaks pipelining
  • web2py can minify, concatenate and inline your static assets
    • But "inline" is used in a different meaning than usual
      • Usual meaning: replace @import rules in your CSS with actual contents of the imported files
        • This is usually wanted, so that the browser can load all the CSS in one request, cache it and not request it again anymore
      • web2py meaning: put that JS or CSS into the <script> tag, so that the src attribute is not needed anymore
        • Prevents caching, loads tons of assets with each request, don't do it
    • But the "minification" is quite limited
      • Only strips comments and unneeded whitespace
      • State-of-the-art minifiers also rename variables to shorter names and rewrite expressions to shorter equivalent forms
    • And your frontend developer has some other tricks in the bag, like web sprites, that are natively supported by his toolchain
      • Conclusion: trust him. Take minified/concatenated/compressed JS from him, let him alter references to these static assets in your templates.
        • Drawback: this adds a build step.

CSS in the welcome app ⌘

  • Bootstrap v3.3.5
  • Custom CSS that makes the calendar widget work
  • Some web2py-specific styles in web2py.css - worth looking in detail and customizing
    • These styles customize various tags, including <h1>
      • Kind of stuff that you expect to see afer a CSS reset, or similar in purpose to normalize.css
    • Many other web2py-specific styles
  • A more modern version of web2py.css is also available as web2py-bootstrap3.css
  • All of them are minified and come without sources (possible license issue)

JS in the welcome app ⌘

  • analytics.js - for Google and not only!
  • Bootstrap has some JS components
  • Calendar
  • jQuery v1.12.2
  • modernizr, respond
    • They implement new tags and CSS attributes in old browsers, and also fix their bugs
  • share.js for sharing on social networks
  • web2py.js - API to be used by applications (to be discussed later in more detail)
  • web2py-bootstrap3.js - some web2py-specific behaviour

Purpose of JS and CSS frameworks ⌘

  • Solve common problems regarding site layout and client-side functionality
    • Multi-column responsive layouts
    • Modern look and feel
    • Hiding differences between browsers' default rendering of HTML
    • Simplifying tasks such as DOM manipulation and AJAX
    • Providing new kinds of UI controls that don't exist in plain HTML5

How web2py defaults compare with the existing best practice ⌘

  • Both Bootstrap and jQuery are industry standards
  • Other industry standard CSS and JS frameworks exist
  • web2py uses some jQuery APIs internally, but can be decoupled from Bootstrap
    • It is sometimes possible to use multiple JS libraries in the same project (e.g. jQuery + React works as long as they don't modify the same DOM)
      • It's fine to use React for UI and jQuery for AJAX - that's what the official React tutorial did in the past
      • Still, some web developers may object
    • There is a lightweight alternative CSS framework by the same author as web2py, stupid.css, also available as a replacement welcome app
  • You should be aware of HTML5 boilerplate

CSS classes that must be defined ⌘

  • That's an exercise for the reader
    • web2py sometimes adds various classes to elements by itself
    • Example: the invalidinput class for invalid inputs, and errorwrapper and error for surrounding divs
    • Find them all in the source code
  • grep is your friend
    • But note that some classes appear only in circumstances where the developer asks for them
    • And some are mentioned only in examples
    • E.g. the Bootstrap-specific btn-default class is used only if the form rendering style is set to bootstrap3_stacked, and avoidable by setting it to anything else

AJAX ⌘

  • Asynchronous JS and XML
  • The underlying JS technology is XMLHttpRequest
  • jQuery contains a number of convenience wrappers

Doing AJAX with jQuery ⌘

  • It still works, and is not specific to web2py
  • $.ajax() is a general-purpose method, to be used when others are not suitable
  • $.get() and $.post() perform GET and POST requests asynchronously
    • They have various callbacks that are called when the request completes
    • Don't forget to call ajaxError(): $(document).ajaxError(function() { ... }) to be notified about errors
    • Alternatively, $.get(...).done(function() { ... }).fail(function() { ... })

Loading HTML fragments with AJAX ⌘

  • An important use case is when an AJAX request is used for loading some HTML fragment
  • The helper to use here is $.load()
  • $( "#result" ).load( "ajax/test.html" );
    • It's a good idea to add error handling, and maybe do something else (like removing the "load..." link) on success

web2py ajax function ⌘

  • Defined in web2py.js
  • ajax(url, [name1, name2, ...], target)
  • Passes the contents of input fields name1, name2, ... as parameters to a given URL via a POST request
  • Loads the result as HTML, replaces everything inside #target with it
    • For such HTML fragments, web2py uses views with the .load extension
    • Or, if the traget is ":eval", evals the returned javascript

Magic attributes of the A tag ⌘

  • To create links that do AJAX, use {{=A("...")}} with magic keyword arguments
    • cid="asd" will load the fragment linked by the _href into #asd using a GET request
    • target="asd" will also load the fragment linked by the component into #asd using a GET request
    • target="asd" will also load the fragment linked by the callback into #asd using a POST request, with no POST variables
    • delete="#qwe" noconfirm=True will also delete #qwe from the DOM once the callback completes successfully
  • To create links that look like buttons, add _class="btn" (Bootstrap-specific)

LOAD and components ⌘

  • Components are self-contained pieces of an application that can be placed into a DIV
    • Components can contain submittable forms
    • To create a component, use views with the .load extension
  • Load components like this: {{=LOAD(controller, function, ajax=True, ajax_trap=True)}}
    • ajax_trap=True traps all links and forms inside the component, so that they only reload the component and not the whole page
    • ajax=False is the default, will run that other controller and paste the resulting HTML inline
    • ajax=False will request that other controller using AJAX. Implies ajax_trap=True

Deployment ⌘

  • Proxying
  • Apache with mod_wsgi
  • nginx with uwsgi
    • Not covered here - evolves too quickly

Deployment via proxying ⌘

  • Create the parameters_8000.py file with the hashed admin password
  • Create a systemd service file for web2py.py
    • Find out the correct flags for non-interactive use
    • Most likely, python2 web2py.py --nogui -p 8000 -a '<recycle>' -e
    • Web2py will run on 127.0.0.1:8000
  • Set up your web server to proxy the requests
  • Let it set X-Forwarded-For and X-Forwarded-Proto headers
    • Unset them in incoming requests!

Deployment via mod_wsgi ⌘

  • Create the parameters_8000.py file with the hashed admin password
  • Copy handlers/wsgihandler.py to the same directory as web2py.py
  • Minimal example configuration:
# On Debian/Ubuntu, use "a2enmod wsgi" instead
LoadModule wsgi_module modules/mod_wsgi.so

WSGIDaemonProcess default processes=2 threads=15 user=... group=...
WSGIScriptAlias / /path/to/web2py/wsgihandler.py
WSGIProcessGroup default

<Directory /path/to/web2py>
    Require all granted
</Directory>

Possible improvements ⌘

  • More restrictive grant (so that there is no chance that the password, sqlite database, or private files are served)
<Directory /path/to/web2py>
    Require all denied
    <Files wsgihandler.py>
        Require all granted
    </Files>
</Directory>
  • Serve static files via Apache
# On Debian/Ubuntu use "a2enmod expires" instead
LoadModule expires_module modules/mod_expires.so

AliasMatch ^/([^/]+)/static/(?:_[\d]+.[\d]+.[\d]+/)?(.*) /path/to/web2py/applications/$1/static/$2
<Directory /path/to/web2py/applications/*/static/>
    Require all granted

    # ONLY if you have enabled static versioning!
    ExpiresActive On
    ExpiresDefault 
</Directory>