piss

entries

Mini NES Electronics Kit Status Update
daftmike's blog 1970-08-29T13:08:00+00:00
Mini NES Electronics Kit Status Update #2
daftmike's blog 1970-09-13T14:24:00+00:00
Shipping delay update :(
daftmike's blog 1970-10-30T12:14:00+00:00
Api
Neovim 2001-01-01T00:00:00+00:00
Api-ui-events
Neovim 2001-01-01T00:00:00+00:00
Autocmd
Neovim 2001-01-01T00:00:00+00:00
Backers
Neovim 2001-01-01T00:00:00+00:00
Change
Neovim 2001-01-01T00:00:00+00:00
Channel
Neovim 2001-01-01T00:00:00+00:00
Cmdline
Neovim 2001-01-01T00:00:00+00:00
Credits
Neovim 2001-01-01T00:00:00+00:00
Debug
Neovim 2001-01-01T00:00:00+00:00
Deprecated
Neovim 2001-01-01T00:00:00+00:00
Dev
Neovim 2001-01-01T00:00:00+00:00
Dev_arch
Neovim 2001-01-01T00:00:00+00:00
Dev_style
Neovim 2001-01-01T00:00:00+00:00
Dev_test
Neovim 2001-01-01T00:00:00+00:00
Dev_theme
Neovim 2001-01-01T00:00:00+00:00
Dev_tools
Neovim 2001-01-01T00:00:00+00:00
Dev_vimpatch
Neovim 2001-01-01T00:00:00+00:00
Develop
Neovim 2001-01-01T00:00:00+00:00
Diagnostic
Neovim 2001-01-01T00:00:00+00:00
Diff
Neovim 2001-01-01T00:00:00+00:00
Digraph
Neovim 2001-01-01T00:00:00+00:00
Editing
Neovim 2001-01-01T00:00:00+00:00
Editorconfig
Neovim 2001-01-01T00:00:00+00:00
Faq
Neovim 2001-01-01T00:00:00+00:00
Filetype
Neovim 2001-01-01T00:00:00+00:00
Fold
Neovim 2001-01-01T00:00:00+00:00
Ft_ada
Neovim 2001-01-01T00:00:00+00:00
Ft_hare
Neovim 2001-01-01T00:00:00+00:00
Ft_ps1
Neovim 2001-01-01T00:00:00+00:00
Ft_raku
Neovim 2001-01-01T00:00:00+00:00
Ft_rust
Neovim 2001-01-01T00:00:00+00:00
Ft_sql
Neovim 2001-01-01T00:00:00+00:00
Gui
Neovim 2001-01-01T00:00:00+00:00
Health
Neovim 2001-01-01T00:00:00+00:00
Helphelp
Neovim 2001-01-01T00:00:00+00:00
If_perl
Neovim 2001-01-01T00:00:00+00:00
If_pyth
Neovim 2001-01-01T00:00:00+00:00
If_ruby
Neovim 2001-01-01T00:00:00+00:00
Indent
Neovim 2001-01-01T00:00:00+00:00
Index
Neovim 2001-01-01T00:00:00+00:00
Insert
Neovim 2001-01-01T00:00:00+00:00
Intro
Neovim 2001-01-01T00:00:00+00:00
Job_control
Neovim 2001-01-01T00:00:00+00:00
L10n-arabic
Neovim 2001-01-01T00:00:00+00:00
L10n-hebrew
Neovim 2001-01-01T00:00:00+00:00
L10n-russian
Neovim 2001-01-01T00:00:00+00:00
L10n-vietnamese
Neovim 2001-01-01T00:00:00+00:00
Lsp
Neovim 2001-01-01T00:00:00+00:00
Lua
Neovim 2001-01-01T00:00:00+00:00
Lua-bit
Neovim 2001-01-01T00:00:00+00:00
Lua-guide
Neovim 2001-01-01T00:00:00+00:00
Lua-plugin
Neovim 2001-01-01T00:00:00+00:00
Luaref
Neovim 2001-01-01T00:00:00+00:00
Luvref
Neovim 2001-01-01T00:00:00+00:00
Map
Neovim 2001-01-01T00:00:00+00:00
Mbyte
Neovim 2001-01-01T00:00:00+00:00
Message
Neovim 2001-01-01T00:00:00+00:00
Mlang
Neovim 2001-01-01T00:00:00+00:00
Motion
Neovim 2001-01-01T00:00:00+00:00
News
Neovim 2001-01-01T00:00:00+00:00
News-0.10
Neovim 2001-01-01T00:00:00+00:00
News-0.11
Neovim 2001-01-01T00:00:00+00:00
News-0.12
Neovim 2001-01-01T00:00:00+00:00
News-0.9
Neovim 2001-01-01T00:00:00+00:00
Nvim
Neovim 2001-01-01T00:00:00+00:00
Nvim_terminal_emulator
Neovim 2001-01-01T00:00:00+00:00
Options
Neovim 2001-01-01T00:00:00+00:00
Pack
Neovim 2001-01-01T00:00:00+00:00
Pattern
Neovim 2001-01-01T00:00:00+00:00
Pi_gzip
Neovim 2001-01-01T00:00:00+00:00
Pi_msgpack
Neovim 2001-01-01T00:00:00+00:00
Pi_paren
Neovim 2001-01-01T00:00:00+00:00
Pi_spec
Neovim 2001-01-01T00:00:00+00:00
Pi_tar
Neovim 2001-01-01T00:00:00+00:00
Pi_tutor
Neovim 2001-01-01T00:00:00+00:00
Pi_zip
Neovim 2001-01-01T00:00:00+00:00
Plugins
Neovim 2001-01-01T00:00:00+00:00
Provider
Neovim 2001-01-01T00:00:00+00:00
Quickfix
Neovim 2001-01-01T00:00:00+00:00
Quickref
Neovim 2001-01-01T00:00:00+00:00
Recover
Neovim 2001-01-01T00:00:00+00:00
Remote
Neovim 2001-01-01T00:00:00+00:00
Remote_plugin
Neovim 2001-01-01T00:00:00+00:00
Repeat
Neovim 2001-01-01T00:00:00+00:00
Rileft
Neovim 2001-01-01T00:00:00+00:00
Scroll
Neovim 2001-01-01T00:00:00+00:00
Sign
Neovim 2001-01-01T00:00:00+00:00
Spell
Neovim 2001-01-01T00:00:00+00:00
Starting
Neovim 2001-01-01T00:00:00+00:00
Support
Neovim 2001-01-01T00:00:00+00:00
Syntax
Neovim 2001-01-01T00:00:00+00:00
Tabpage
Neovim 2001-01-01T00:00:00+00:00
Tagsrch
Neovim 2001-01-01T00:00:00+00:00
Term
Neovim 2001-01-01T00:00:00+00:00
Terminal
Neovim 2001-01-01T00:00:00+00:00
Tips
Neovim 2001-01-01T00:00:00+00:00
Treesitter
Neovim 2001-01-01T00:00:00+00:00
Tui
Neovim 2001-01-01T00:00:00+00:00
Uganda
Neovim 2001-01-01T00:00:00+00:00
Ui
Neovim 2001-01-01T00:00:00+00:00
Undo
Neovim 2001-01-01T00:00:00+00:00
Userfunc
Neovim 2001-01-01T00:00:00+00:00
Usr_01
Neovim 2001-01-01T00:00:00+00:00
Usr_02
Neovim 2001-01-01T00:00:00+00:00
Usr_03
Neovim 2001-01-01T00:00:00+00:00
Usr_04
Neovim 2001-01-01T00:00:00+00:00
Usr_05
Neovim 2001-01-01T00:00:00+00:00
Usr_06
Neovim 2001-01-01T00:00:00+00:00
Usr_07
Neovim 2001-01-01T00:00:00+00:00
Usr_08
Neovim 2001-01-01T00:00:00+00:00
Usr_09
Neovim 2001-01-01T00:00:00+00:00
Usr_10
Neovim 2001-01-01T00:00:00+00:00
Usr_11
Neovim 2001-01-01T00:00:00+00:00
Usr_12
Neovim 2001-01-01T00:00:00+00:00
Usr_20
Neovim 2001-01-01T00:00:00+00:00
Usr_21
Neovim 2001-01-01T00:00:00+00:00
Usr_22
Neovim 2001-01-01T00:00:00+00:00
Usr_23
Neovim 2001-01-01T00:00:00+00:00
Usr_24
Neovim 2001-01-01T00:00:00+00:00
Usr_25
Neovim 2001-01-01T00:00:00+00:00
Usr_26
Neovim 2001-01-01T00:00:00+00:00
Usr_27
Neovim 2001-01-01T00:00:00+00:00
Usr_28
Neovim 2001-01-01T00:00:00+00:00
Usr_29
Neovim 2001-01-01T00:00:00+00:00
Usr_30
Neovim 2001-01-01T00:00:00+00:00
Usr_31
Neovim 2001-01-01T00:00:00+00:00
Usr_32
Neovim 2001-01-01T00:00:00+00:00
Usr_40
Neovim 2001-01-01T00:00:00+00:00
Usr_41
Neovim 2001-01-01T00:00:00+00:00
Usr_42
Neovim 2001-01-01T00:00:00+00:00
Usr_43
Neovim 2001-01-01T00:00:00+00:00
Usr_44
Neovim 2001-01-01T00:00:00+00:00
Usr_45
Neovim 2001-01-01T00:00:00+00:00
Usr_toc
Neovim 2001-01-01T00:00:00+00:00
Various
Neovim 2001-01-01T00:00:00+00:00
Vi_diff
Neovim 2001-01-01T00:00:00+00:00
Vim_diff
Neovim 2001-01-01T00:00:00+00:00
Vimeval
Neovim 2001-01-01T00:00:00+00:00
Vimfn
Neovim 2001-01-01T00:00:00+00:00
Visual
Neovim 2001-01-01T00:00:00+00:00
Vvars
Neovim 2001-01-01T00:00:00+00:00
Windows
Neovim 2001-01-01T00:00:00+00:00
About
Neovim 2001-01-01T00:00:00+00:00
Build
Neovim 2001-01-01T00:00:00+00:00
Helptag redirect
Neovim 2001-01-01T00:00:00+00:00
Install
Neovim 2001-01-01T00:00:00+00:00
News archive
Neovim 2001-01-01T00:00:00+00:00
Roadmap
Neovim 2001-01-01T00:00:00+00:00
Screenshots
Neovim 2001-01-01T00:00:00+00:00
Sponsors
Neovim 2001-01-01T00:00:00+00:00
Vision
Neovim 2001-01-01T00:00:00+00:00
Subspace / Continuum History
Dan Luu 2006-02-01T00:00:00+00:00
History of Symbolics lisp machines
Dan Luu 2007-11-16T00:00:00+00:00
Entourage + Applescript = Frustration
Steve Losh 2008-02-21T15:25:45+00:00
Work-life balance at Bioware
Dan Luu 2008-05-31T00:00:00+00:00
Site Redesign
Steve Losh 2009-01-11T17:58:23+00:00
Going Open Source
Steve Losh 2009-01-13T20:08:56+00:00
Deploying with Fabric & Mercurial
Steve Losh 2009-01-15T20:51:09+00:00
How & Why I DJ
Steve Losh 2009-02-06T17:53:44+00:00
BumpMapping hell
Fabien Sanglard 2009-03-04T04:33:27+00:00
Mercurial Bash Prompts
Steve Losh 2009-03-17T21:34:55+00:00
Candy Colored Terminal
Steve Losh 2009-03-18T18:26:28+00:00
Fluide
Fabien Sanglard 2009-04-15T04:33:27+00:00
Fluid v1.1 up and coming...
Fabien Sanglard 2009-05-09T04:33:27+00:00
Wolfenstein 3D for iPhone code review
Fabien Sanglard 2009-05-09T04:33:27+00:00
Fluid: 1,000,000 downloads !!
Fabien Sanglard 2009-05-14T04:33:27+00:00
What I Hate About Mercurial
Steve Losh 2009-05-29T19:51:05+00:00
How to Contribute to Mercurial
Steve Losh 2009-06-01T20:09:44+00:00
Fluid2 RELEASED ! Fluid 1 now at 3,000,000 downloads !!
Fabien Sanglard 2009-06-09T04:33:27+00:00
Fluid speed issues!
Fabien Sanglard 2009-06-29T04:33:27+00:00
A Guide to Branching in Mercurial
Steve Losh 2009-08-30T20:27:12+00:00
Armadillo Space T-shirt
Fabien Sanglard 2009-10-14T04:33:27+00:00
iPhone 3D engine programming part 1
Fabien Sanglard 2009-10-19T04:33:27+00:00
Apple iPhone Tech Talk 2009 tricks and treats
Fabien Sanglard 2009-12-03T04:33:27+00:00
Don't learn Assembly on Mac OS X
Fabien Sanglard 2009-12-31T04:33:27+00:00
Are closed social networks inevitable?
Dan Luu 2010-01-01T00:00:00+00:00
How does Boston compare to SV and what do MIT and Stanford have to do with it?
Dan Luu 2010-01-01T00:00:00+00:00
Doom engine 1993 code review
Fabien Sanglard 2010-01-13T04:33:27+00:00
Moving from Django to Hyde
Steve Losh 2010-01-15T20:14:00+00:00
The Real Difference Between Mercurial and Git
Steve Losh 2010-01-20T21:56:00+00:00
My Extravagant Zsh Prompt
Steve Losh 2010-02-01T01:05:00+00:00
Doom iPhone code review
Fabien Sanglard 2010-02-01T04:33:27+00:00
Momentary latching circuit
daftmike's blog 2010-02-12T13:52:00+00:00
Low-battery indicator circuit
daftmike's blog 2010-02-18T18:50:00+00:00
Mercurial Workflows: Branch As Needed
Steve Losh 2010-02-28T14:00:00+00:00
How to build a circuit on Veroboard
daftmike's blog 2010-03-02T20:31:00+00:00
PSOne screen led-mod
daftmike's blog 2010-03-10T10:15:00+00:00
My Darling Dreamcast
daftmike's blog 2010-03-20T15:14:00+00:00
It's not even the beginning of the end...
daftmike's blog 2010-03-27T14:13:00+00:00
My Backlit Dreamcast VMU
daftmike's blog 2010-04-09T15:28:00+00:00
A Faster Feed Apart
Steve Losh 2010-04-30T22:55:00+00:00
Mercurial Workflows: Stable & Default
Steve Losh 2010-05-17T18:27:00+00:00
Tracing the baseband
Fabien Sanglard 2010-05-27T18:08:27+00:00
Mercurial Workflows: Translation Branches
Steve Losh 2010-06-11T08:15:00+00:00
A Git User's Guide to Mercurial Queues
Steve Losh 2010-08-10T21:00:00+00:00
Coming Home to Vim
Steve Losh 2010-09-20T18:15:00+00:00
Wii Sensor Bar Projector
daftmike's blog 2010-11-24T14:48:00+00:00
All about the fillrate
Fabien Sanglard 2010-12-11T21:36:45+00:00
SHMUP Lite
Fabien Sanglard 2010-12-19T21:36:45+00:00
Oh yes I'm working on new stuff...
Plogue R&D 2010-12-23T19:48:00+00:00
To become a good C programmer
Fabien Sanglard 2011-02-02T21:36:45+00:00
patience
Plogue R&D 2011-02-04T21:17:00+00:00
Lamer Exterminator, or how a 22 year old malware can still piss you off.
Plogue R&D 2011-02-19T20:03:00+00:00
To generate 60fps videos on iOS
Fabien Sanglard 2011-02-21T21:36:45+00:00
dEngine Source Code Released
Fabien Sanglard 2011-04-28T21:36:45+00:00
The Great AdLib Fire ...
Plogue R&D 2011-05-04T00:58:00+00:00
Does ... not ... compute...
Plogue R&D 2011-05-10T14:04:00+00:00
Going Paper-Free for $220
Steve Losh 2011-05-26T13:44:00+00:00
The reluctant US SMS that didnt want to be japanese
Plogue R&D 2011-06-07T01:46:00+00:00
soldiering on.
What Was Found 2011-06-09T20:33:17+00:00
(SAFE) US SMS Japanese Mod in action
Plogue R&D 2011-06-24T17:40:00+00:00
Playing the revolution/Home Computer Invasion Documentaries in Trouble???
Plogue R&D 2011-06-25T19:20:00+00:00
Polygon Codec
Fabien Sanglard 2011-06-26T21:36:45+00:00
Esperanto
What Was Found 2011-06-28T20:24:22+00:00
SHMUP Source Code
Fabien Sanglard 2011-06-30T07:36:45+00:00
Django Advice
Steve Losh 2011-06-30T08:30:00+00:00
It’s the thought that doesn’t count
What Was Found 2011-06-30T21:12:49+00:00
Seriously?
What Was Found 2011-07-07T21:42:12+00:00
Concerning monsters
What Was Found 2011-07-14T01:12:25+00:00
Hacker Monthly publication
Fabien Sanglard 2011-07-15T01:36:45+00:00
Dear dude that sneaks into my room at night and leaves recorded stories
What Was Found 2011-07-22T23:31:12+00:00
For the Record (pun intended)
What Was Found 2011-07-26T19:06:29+00:00
Your next meal will taste great
What Was Found 2011-07-27T22:23:00+00:00
Analog TV Death toll
Plogue R&D 2011-09-01T13:37:00+00:00
Writing Vim Plugins
Steve Losh 2011-09-06T09:13:00+00:00
Solving Ghost in The Wire codes
Fabien Sanglard 2011-09-08T01:36:45+00:00
Solving Ghost in The Wire codes
Fabien Sanglard 2011-09-11T01:08:45+00:00
Quake 2 Source Code Review
Fabien Sanglard 2011-09-20T01:08:45+00:00
Building a Cube64... part 1
daftmike's blog 2011-10-09T19:47:00+00:00
Arcade Restoration - Week1: Acquisition
Plogue R&D 2011-10-14T09:38:00+00:00
My weapons shed and a 360 degree C4 minefield
What Was Found 2011-10-20T16:55:47+00:00
untitled
What Was Found 2011-10-20T19:00:20+00:00
JAMMA Space Invaders experiment.
Plogue R&D 2011-11-03T16:58:00+00:00
How to build Doom3 on Mac OS X with XCode
Fabien Sanglard 2011-11-25T01:08:45+00:00
Another World Code Review
Fabien Sanglard 2011-11-27T01:08:45+00:00
Progressive playback: An atom story
Fabien Sanglard 2011-11-27T01:08:45+00:00
AY8930 sourced!
Plogue R&D 2011-12-20T16:34:00+00:00
AY8930 Initial tests!
Plogue R&D 2011-12-22T19:54:00+00:00
About this dev blog
Evennia Devblog RSS Feed 2012-02-05T00:00:00+00:00
Evennia's open bottlenecks
Evennia Devblog RSS Feed 2012-02-05T00:00:00+00:00
New Scope
Plogue R&D 2012-02-08T15:26:00+00:00
Such a small thing ...
Evennia Devblog RSS Feed 2012-02-15T00:00:00+00:00
Commands and you
Evennia Devblog RSS Feed 2012-02-17T00:00:00+00:00
Tutorial MUD, part 1: Environment setup
TutorialMUD - pileborg.se 2012-02-18T18:28:12+00:00
Tutorial MUD, part 1.5: Makefile dependencies
TutorialMUD - pileborg.se 2012-02-21T20:47:35+00:00
Dummies doing dummy things
Evennia Devblog RSS Feed 2012-02-22T00:00:00+00:00
Tutorial MUD, part 2: Logging
TutorialMUD - pileborg.se 2012-02-22T19:19:58+00:00
Android Shmup
Fabien Sanglard 2012-02-23T01:08:45+00:00
Tutorial MUD, part 3: Argument parsing
TutorialMUD - pileborg.se 2012-03-04T10:33:41+00:00
Tutorial MUD, part 4: Mainloop and signals
TutorialMUD - pileborg.se 2012-03-10T15:38:52+00:00
SSD reboot your thinking
Fabien Sanglard 2012-03-17T01:08:45+00:00
Jonathan Shapiro's Retrospective Thoughts on BitC
Dan Luu 2012-03-23T00:00:00+00:00
Tutorial MUD, part 5: Networking, part 1
TutorialMUD - pileborg.se 2012-03-24T08:57:35+00:00
Shortcuts to goodness
Evennia Devblog RSS Feed 2012-03-26T00:00:00+00:00
Be A Donor
Fabien Sanglard 2012-04-22T01:08:45+00:00
TutorialMUD hiatus
TutorialMUD - pileborg.se 2012-04-22T12:49:04+00:00
Volatile Software
Steve Losh 2012-04-23T14:00:00+00:00
Why Go?
Nathan Youngman 2012-05-07T00:00:00+00:00
Cracking Kevin Mitnick's Ghost In Tthe Wires Paperback Edition
Fabien Sanglard 2012-05-09T01:08:45+00:00
Address Sniffing an EPROM
Plogue R&D 2012-05-21T19:44:00+00:00
Dummies doing (even more) dummy things
Evennia Devblog RSS Feed 2012-05-30T00:00:00+00:00
Doom3 Source Code Review
Fabien Sanglard 2012-06-08T01:08:45+00:00
Coding from the inside
Evennia Devblog RSS Feed 2012-06-11T00:00:00+00:00
Extending time and details
Evennia Devblog RSS Feed 2012-06-26T00:00:00+00:00
Oculus RIFT development
Fabien Sanglard 2012-06-30T01:08:45+00:00
Quake 3Source Code Review
Fabien Sanglard 2012-06-30T01:08:45+00:00
The Caves of Clojure: Part 1
Steve Losh 2012-07-07T17:00:00+00:00
The Caves of Clojure: Part 2
Steve Losh 2012-07-08T09:26:00+00:00
The Caves of Clojure: Part 3.1
Steve Losh 2012-07-09T09:37:00+00:00
The Caves of Clojure: Part 3.2
Steve Losh 2012-07-10T10:04:00+00:00
The Caves of Clojure: Part 3.3
Steve Losh 2012-07-11T09:25:00+00:00
The Caves of Clojure: Part 3.4
Steve Losh 2012-07-11T12:02:00+00:00
The Caves of Clojure: Part 4
Steve Losh 2012-07-12T09:42:00+00:00
The Caves of Clojure: Part 5
Steve Losh 2012-07-13T10:55:00+00:00
The Caves of Clojure: Interlude 1
Steve Losh 2012-07-14T17:06:00+00:00
NES eprom carts
Plogue R&D 2012-07-25T13:43:00+00:00
The Caves of Clojure: Part 6
Steve Losh 2012-07-30T09:50:00+00:00
Namcot163 Dual 27C020 Eprom Cart
Plogue R&D 2012-08-06T20:38:00+00:00
Taking command
Evennia Devblog RSS Feed 2012-08-16T00:00:00+00:00
Combining Twisted and Django
Evennia Devblog RSS Feed 2012-08-31T00:00:00+00:00
Galaxian's digital oscillator explained.
Plogue R&D 2012-09-08T20:41:00+00:00
The Homely Mutt
Steve Losh 2012-10-01T10:30:00+00:00
A Modern Space Cadet
Steve Losh 2012-10-03T09:55:00+00:00
Community interest
Evennia Devblog RSS Feed 2012-10-05T00:00:00+00:00
The Caves of Clojure: Part 7.1
Steve Losh 2012-10-15T09:50:00+00:00
Evennia changes to BSD license
Evennia Devblog RSS Feed 2012-10-28T00:00:00+00:00
The future of TutorialMUD
TutorialMUD - pileborg.se 2012-11-10T05:35:00+00:00
on the banks of the O-rontes
Kooneiform 2012-12-07T06:45:43+00:00
Game timers: Issues and solutions
Fabien Sanglard 2012-12-25T01:08:45+00:00
📕 Reviewing Practical Object-Oriented Design
Nathan Youngman 2013-01-10T00:00:00+00:00
Go Object Oriented Design
Nathan Youngman 2013-01-14T00:00:00+00:00
Duke Nukem 3D Code Review
Fabien Sanglard 2013-01-17T01:08:45+00:00
The best Tech books
Fabien Sanglard 2013-01-17T01:08:45+00:00
Soldering '80
Plogue R&D 2013-01-20T02:10:00+00:00
Fallout 3 – Edges
Simonschreibt. 2013-01-20T23:12:00+00:00
Teleglitch – Viewcones
Simonschreibt. 2013-01-21T19:49:00+00:00
Teleglitch – RGB Flickering
Simonschreibt. 2013-01-21T19:53:00+00:00
Diablo 3 – Trees
Simonschreibt. 2013-01-21T19:55:00+00:00
Warcraft 3 – Billboards
Simonschreibt. 2013-01-21T19:58:00+00:00
Divine Divinity – 2D Reflexion
Simonschreibt. 2013-01-21T20:00:00+00:00
Cel Shading
Simonschreibt. 2013-01-21T20:01:00+00:00
Deus Ex – Occlusion
Simonschreibt. 2013-01-21T20:03:00+00:00
Reverse Engineer Strike Commander
Fabien Sanglard 2013-01-22T01:08:45+00:00
Deus Ex 3 – Folds
Simonschreibt. 2013-01-22T21:03:00+00:00
Deus Ex – Scanlines
Simonschreibt. 2013-01-22T22:38:00+00:00
World of Warcraft – Balloon
Simonschreibt. 2013-01-23T22:38:00+00:00
Assassins Creed 3 – Windows
Simonschreibt. 2013-01-24T22:27:00+00:00
Good advice
Kooneiform 2013-01-25T19:38:16+00:00
Assassins Creed 3 – LoD Blending
Simonschreibt. 2013-01-27T22:40:00+00:00
a micro mud?
Kooneiform 2013-01-28T00:36:15+00:00
Kid Icarus – Tricks
Simonschreibt. 2013-01-28T22:45:00+00:00
Churning behind the scenes
Evennia Devblog RSS Feed 2013-01-29T00:00:00+00:00
First, there was the wheel. Then there was another wheel.
Kooneiform 2013-02-01T03:30:01+00:00
Left 4 Dead 2 – Puke
Simonschreibt. 2013-02-01T20:25:00+00:00
Sacred 2 – Crystal Reflexion
Simonschreibt. 2013-02-03T20:47:00+00:00
Sacred 2 – Pulse Shader
Simonschreibt. 2013-02-04T22:37:00+00:00
A single-threaded multiplexing server in Clojure, first attempt
Kooneiform 2013-02-05T06:17:31+00:00
Function Types in Go (golang)
jordan orelli 2013-02-05T19:53:00+00:00
Battlefield Bad Company 2 – Smoke Column
Simonschreibt. 2013-02-11T13:08:00+00:00
Kara Swisher interview of Jack Dorsey
Dan Luu 2013-02-12T00:00:00+00:00
server in Clojure, second attempt
Kooneiform 2013-02-13T05:23:27+00:00
Battlefield 2 – Flag Sound
Simonschreibt. 2013-02-13T07:50:00+00:00
Sacred 2 – Burning Map
Simonschreibt. 2013-02-15T14:29:00+00:00
Assassins Creed 3 – Bouncing Light
Simonschreibt. 2013-02-19T09:52:00+00:00
one
jordan orelli 2013-02-20T02:49:01+00:00
two
jordan orelli 2013-02-20T05:03:24+00:00
Airborn – Trees
Simonschreibt. 2013-02-21T23:03:00+00:00
color study
jordan orelli 2013-02-22T06:27:00+00:00
1943 – Retro Shadows
Simonschreibt. 2013-02-28T22:34:00+00:00
growth
jordan orelli 2013-03-01T02:50:27+00:00
growth outline
jordan orelli 2013-03-01T05:04:41+00:00
server in Clojure, third attempt, solely for posterity’s sake
Kooneiform 2013-03-03T17:15:01+00:00
server in Clojure, fourth attempt
Kooneiform 2013-03-04T02:11:42+00:00
Metal Gear Rising – Slicing
Simonschreibt. 2013-03-04T20:54:00+00:00
Latency mitigation strategies (by John Carmack)
Dan Luu 2013-03-05T00:00:00+00:00
Dead Space 3 – Diffuse Reflections
Simonschreibt. 2013-03-10T20:24:00+00:00
Who was Rolindar?
Kooneiform 2013-03-11T02:35:48+00:00
intro to 3d
jordan orelli 2013-03-11T04:19:57+00:00
Homeworld 2: Backgrounds
Simonschreibt. 2013-03-15T20:08:00+00:00
Homeworld 2 – Backgrounds Tech
Simonschreibt. 2013-03-17T22:46:00+00:00
making roguelikes with Clojure
Kooneiform 2013-03-22T05:22:56+00:00
Diablo 3 – Resource Bubbles
Simonschreibt. 2013-03-25T20:50:00+00:00
List Out of Lambda
Steve Losh 2013-03-30T14:00:00+00:00
007 Legends – The World
Simonschreibt. 2013-04-01T22:12:00+00:00
hexes
jordan orelli 2013-04-03T03:02:00+00:00
Homeworld 2 – Engines
Simonschreibt. 2013-04-05T23:01:00+00:00
Git Koans
Steve Losh 2013-04-08T10:16:00+00:00
Homeworld 2 – Hyperspace
Simonschreibt. 2013-04-14T18:41:00+00:00
Bioshock – Glossiness
Simonschreibt. 2013-04-19T20:54:00+00:00
Starcraft 2 – Localization
Simonschreibt. 2013-04-24T17:54:00+00:00
Doom 3 – Modding Notes
Simonschreibt. 2013-04-30T11:28:00+00:00
Doom 3 – Volumetric Glow
Simonschreibt. 2013-05-01T20:51:00+00:00
flap ya wings, little boids
jordan orelli 2013-05-07T02:07:22+00:00
Shadow World instead of TutorialMUD?
TutorialMUD - pileborg.se 2013-05-07T17:30:35+00:00
Doom 3 – HDUI
Simonschreibt. 2013-05-09T21:00:00+00:00
Meridian
worst bedtime stories 2013-05-12T23:34:03+00:00
One to Many
Evennia Devblog RSS Feed 2013-05-13T00:00:00+00:00
Photo
Infraspace 2013-05-13T00:57:00+00:00
Scribble Cel
Simonschreibt. 2013-05-15T21:15:00+00:00
Lego Batman – Crawler
Simonschreibt. 2013-05-21T20:00:00+00:00
Doom3 BFG Code Review
Fabien Sanglard 2013-05-23T01:08:45+00:00
Dead kids, dead animals, and other such jollity
How to Spot a Psychopath 2013-05-25T03:12:59+00:00
picked up an ewi over the weekend, figured out how to play the...
jordan orelli 2013-05-31T02:01:07+00:00
An excuse to use that spider photo again
How to Spot a Psychopath 2013-06-03T01:04:09+00:00
Small ridiculous object du jour
How to Spot a Psychopath 2013-06-04T05:04:36+00:00
Dungeon Keeper 2 – Walls
Simonschreibt. 2013-06-05T22:23:00+00:00
GBA SP Speaker Impulse Response
Plogue R&D 2013-06-06T14:30:00+00:00
"I will not buy this record, it is the wax tadpole."
How to Spot a Psychopath 2013-06-12T02:59:27+00:00
Prince Of Persia Code Review
Fabien Sanglard 2013-06-14T01:08:45+00:00
My ever-vigilant Perpetual-Motion-Claims Patrol
How to Spot a Psychopath 2013-06-14T07:36:43+00:00
Lego – Studs
Simonschreibt. 2013-06-21T22:53:00+00:00
Zaps and bangs
How to Spot a Psychopath 2013-06-26T08:55:30+00:00
Making arcade cabinet impulse responses.
Plogue R&D 2013-06-29T14:02:00+00:00
1nsane Carpet 2 – Repetitive Worlds
Simonschreibt. 2013-07-08T19:46:00+00:00
Track-Best Library Updated
nklein software 2013-07-08T22:48:25+00:00
A metallurgical detective story
How to Spot a Psychopath 2013-07-10T12:08:10+00:00
Binding of Isaac – Composition
Simonschreibt. 2013-07-15T20:46:00+00:00
Inverse functions with fixed-points
nklein software 2013-07-18T15:44:43+00:00
IRC Graphs
nklein software 2013-07-24T05:38:01+00:00
Nnnnnnnnnnnyeeeeeowwww
How to Spot a Psychopath 2013-07-28T04:02:08+00:00
I’ve been replaying Earthbound since its rerelease on the...
Zac Gorman 2013-08-07T19:14:00+00:00
Mega Man X piece for this year’s Fangamer X Attract Mode...
Zac Gorman 2013-08-08T15:31:00+00:00
Company of Heroes – Shaded Smoke
Simonschreibt. 2013-08-09T20:23:00+00:00
The Bowling Game Kata in Functional Common Lisp
nklein software 2013-08-14T23:47:06+00:00
Second Reality Code Review
Fabien Sanglard 2013-08-16T01:08:45+00:00
Magical Game Time Vol.1 is now AVAILABLE! MY BOOK IS FINALLY...
Zac Gorman 2013-08-16T22:54:41+00:00
All that glisters
How to Spot a Psychopath 2013-08-18T09:20:38+00:00
Company of Heroes – Flamethrower
Simonschreibt. 2013-08-20T19:23:00+00:00
Doom III BFG Documentation
Fabien Sanglard 2013-08-31T01:08:45+00:00
About danluu.com
Dan Luu 2013-09-01T00:00:00+00:00
Reports of MWO's death have been somewhat exaggerated
How to Spot a Psychopath 2013-09-03T03:39:56+00:00
Teach, Don't Tell
Steve Losh 2013-09-03T10:55:00+00:00
More Doom III BFG Documentation
Fabien Sanglard 2013-09-04T01:08:45+00:00
Verilog is weird
Dan Luu 2013-09-07T00:00:00+00:00
Lines Are Big Circles
nklein software 2013-09-13T17:18:10+00:00
Writing safe Verilog
Dan Luu 2013-09-15T00:00:00+00:00
On killing numerous aliens with a rubber-band gun
How to Spot a Psychopath 2013-09-17T03:49:37+00:00
You found me!
Simonschreibt. 2013-09-20T22:54:22+00:00
Decyphering the Business Card Raytracer
Fabien Sanglard 2013-09-21T01:08:45+00:00
self portrait
jordan orelli 2013-09-23T14:41:00+00:00
World of Torch Siege – Blended Trunks
Simonschreibt. 2013-09-27T19:05:09+00:00
Randomize HN
Dan Luu 2013-10-04T00:00:00+00:00
Learning Legendary Hardware
Fabien Sanglard 2013-10-07T01:08:45+00:00
Super Hot – Turn-based Action
Simonschreibt. 2013-10-08T20:10:21+00:00
chipcrusher re-sampling vs frequency response
Plogue R&D 2013-10-10T18:03:00+00:00
Power-On Self-Test...
int10h.org - VileR's blog 2013-10-10T23:06:43+00:00
Using Photoshop as a CGA Bitmap Paint Program
int10h.org - VileR's blog 2013-10-12T19:38:27+00:00
The Mazes of Shamus - IBM PC Version
int10h.org - VileR's blog 2013-10-21T04:28:51+00:00
Mega Evolutions
Zac Gorman 2013-10-21T17:53:31+00:00
A list of Evennia topics
Evennia Devblog RSS Feed 2013-10-22T00:00:00+00:00
Little morning warm up drawing feat. some characters from Night...
Zac Gorman 2013-10-25T14:35:20+00:00
How to discourage open source contributions
Dan Luu 2013-10-27T00:00:00+00:00
Testing exit values in Bash
Arabesque 2013-10-28T05:56:37+00:00
Been thinking about Skull Kid and Majora’s Mask. The more...
Zac Gorman 2013-10-28T20:05:34+00:00
Photo
Infraspace 2013-11-04T19:51:27+00:00
Thanks.
Infraspace 2013-11-05T20:16:44+00:00
Why hardware development is hard
Dan Luu 2013-11-10T00:00:00+00:00
Oblivion Territory: Tree vs. Palm
Simonschreibt. 2013-11-15T21:51:46+00:00
Photo
Zac Gorman 2013-11-22T15:19:48+00:00
Sacred 2 – Floating Point Numbers
Simonschreibt. 2013-11-22T17:53:19+00:00
Photo
Zac Gorman 2013-11-22T20:06:08+00:00
dinner, 11-25-13
food bores me 2013-11-26T04:48:00+00:00
despair snack, 11-26-13
food bores me 2013-11-26T21:46:13+00:00
Out of band mergings
Evennia Devblog RSS Feed 2013-11-28T00:00:00+00:00
breakfast, 11-30-13
food bores me 2013-11-30T15:41:01+00:00
Handmade Normal Maps
Simonschreibt. 2013-12-03T23:02:56+00:00
lunch, close week, 12-2-03
food bores me 2013-12-04T05:31:00+00:00
PCA is not a panacea
Dan Luu 2013-12-13T00:00:00+00:00
Imaginary Realities is back
Evennia Devblog RSS Feed 2013-12-16T00:00:00+00:00
square flower
jordan orelli 2013-12-23T17:18:21+00:00
Data alignment and caches
Dan Luu 2014-01-02T00:00:00+00:00
Tomb Raider – Laras Hot Secrets
Simonschreibt. 2014-01-02T12:58:12+00:00
Hoplite News
Magma Fortress 2014-01-04T07:29:00+00:00
Do programmers need math?
Dan Luu 2014-01-09T00:00:00+00:00
Whipped this up on my lunch break to let everybody know...
Zac Gorman 2014-01-09T19:42:40+00:00
Prey – Evil Buttons
Simonschreibt. 2014-01-09T23:25:53+00:00
lunch, 1-8-14
food bores me 2014-01-10T06:53:00+00:00
Photo
Zac Gorman 2014-01-10T19:38:01+00:00
Looking forwards and backwards
Evennia Devblog RSS Feed 2014-01-24T00:00:00+00:00
pulse
jordan orelli 2014-01-26T21:40:00+00:00
additive test
jordan orelli 2014-01-28T13:37:21+00:00
FUN FACT: (1) head of cabbage, when mixed with (1) potato and...
food bores me 2014-01-29T00:35:00+00:00
loving where this blog is going!!!
food bores me 2014-01-29T22:10:57+00:00
no water, no fish
jordan orelli 2014-02-02T22:30:00+00:00
Photo
Zac Gorman 2014-02-04T20:57:00+00:00
I promised you failure, and lo, here it is. This is a large bowl...
food bores me 2014-02-06T06:10:00+00:00
Moving from Google Code to Github
Evennia Devblog RSS Feed 2014-02-08T00:00:00+00:00
Why don't schools teach debugging?
Dan Luu 2014-02-08T00:00:00+00:00
CAUTION INSERT SECURELY LEST POWER CORD SHOULD BE DETACHED IN...
Infraspace 2014-02-10T16:25:00+00:00
Algorithms and Data structures books: One size doesn't fit them all
Fabien Sanglard 2014-02-14T01:08:45+00:00
This is my entire dinner. This is it, this is what a grown adult...
food bores me 2014-02-19T03:01:00+00:00
Don’t starve, Diablo – Parallax 7
Simonschreibt. 2014-02-24T22:38:37+00:00
Poupée de Son - Narrative Game based on Grimm’s “Hare’s...
winnie song 2014-02-25T04:44:00+00:00
ULFBERT - Name of an unusually strong and lasting Scandinavian...
winnie song 2014-02-25T04:45:00+00:00
Repair - A game about fortifying the ground you stand on. [DL...
winnie song 2014-02-25T04:45:00+00:00
Hellsmouth Concept
winnie song 2014-02-25T04:46:00+00:00
Hellsmouth Concept Animation - 2012
winnie song 2014-02-25T04:47:00+00:00
That time Oracle tried to have a professor fired for benchmarking their database
Dan Luu 2014-03-05T00:00:00+00:00
Too big to believe
How to Spot a Psychopath 2014-03-08T00:06:00+00:00
That bogus gender gap article
Dan Luu 2014-03-09T00:00:00+00:00
7DRL Preparation
Magma Fortress 2014-03-09T03:45:00+00:00
7DRL: Day 2
Magma Fortress 2014-03-10T15:12:00+00:00
7DRL: Day 3
Magma Fortress 2014-03-11T15:13:00+00:00
The Computer Graphics Library
Fabien Sanglard 2014-03-12T01:08:45+00:00
7DRL: Day 4
Magma Fortress 2014-03-12T14:58:00+00:00
7DRL: Day 7 (Ragtag is a success)
Magma Fortress 2014-03-15T12:55:00+00:00
Editing binaries
Dan Luu 2014-03-23T00:00:00+00:00
🕸️ My Journey Into Programming
Nathan Youngman 2014-03-30T00:00:00+00:00
Git Source Code Review
Fabien Sanglard 2014-03-30T01:08:45+00:00
Ate this two week old kale and beet salad with spicy peanut...
food bores me 2014-04-01T03:09:21+00:00
This weekend, I’m taking a couple days off from my book...
Zac Gorman 2014-04-02T00:29:04+00:00
Windows AC/Row/Infinite
Simonschreibt. 2014-04-02T20:59:03+00:00
Data-driven bug finding
Dan Luu 2014-04-06T00:00:00+00:00
Some GBC-style sprites I did for Frog Egg. (I reuploaded because...
i make video games 2014-04-09T20:47:08+00:00
My book is finally announced! Costume Quest: Invasion of the...
Zac Gorman 2014-04-10T16:42:00+00:00
happy egg friday
i make video games 2014-04-18T17:50:14+00:00
Whatever this frozen peas + eggplant + can of kidney beans...
food bores me 2014-04-22T14:03:00+00:00
necking
jordan orelli 2014-04-25T19:10:13+00:00
I was on Etsy’s Instagram feed the other day. Neat :)
jordan orelli 2014-04-28T20:56:02+00:00
Did some pixel art of various cat-based tiles for a weird game...
i make video games 2014-04-29T06:41:58+00:00
My Lunch Monstrosity, A Greatest Hits Album Featuring such...
food bores me 2014-04-29T15:02:07+00:00
I made some cats
i make video games 2014-04-30T08:40:17+00:00
As a follow up to my past posts, I’ve finally separated...
i make video games 2014-05-01T08:56:00+00:00
Your art is super awesome :)
i make video games 2014-05-02T03:56:41+00:00
Thanks to the Stanley Cup Playoffs, a very jam-packed work...
food bores me 2014-05-06T03:39:00+00:00
Listen, it was 8 in the morning, I was wildly hungover, the bus...
food bores me 2014-05-12T14:40:43+00:00
Shamus Keyboard Woes Explained
int10h.org - VileR's blog 2014-05-13T10:43:38+00:00
self portrait, cut paper
jordan orelli 2014-05-14T12:22:42+00:00
A 2-player one screen game in which the players take turns...
winnie song 2014-05-15T09:15:00+00:00
Welcome to the new ASCIImator!
Posts on asie's blog 2014-05-15T23:00:00+00:00
Arithmetic Games Set 1: a Peek into One of the First-Ever IBM PC Games
int10h.org - VileR's blog 2014-05-15T23:54:46+00:00
Imaginary Realities volume 6, issue 1
Evennia Devblog RSS Feed 2014-05-16T00:00:00+00:00
New movement options for Hoplite
Magma Fortress 2014-05-16T13:33:00+00:00
Newsletter #1 - A New Hope
Neovim 2014-06-06T00:00:00+00:00
Trespasser: Jurassic Park CG Source Code Review
Fabien Sanglard 2014-06-10T01:08:45+00:00
It’s a long way off still, but started mocking up a music...
i make video games 2014-06-12T21:42:00+00:00
Bringing back Python memory
Evennia Devblog RSS Feed 2014-06-15T00:00:00+00:00
Mer-Maid Manor (2P/Playbot) Can you clean the manor before the...
Zac Gorman 2014-06-26T14:04:43+00:00
Webby stuff
Evennia Devblog RSS Feed 2014-06-30T00:00:00+00:00
On the pulverisation of potatoes
How to Spot a Psychopath 2014-07-02T05:37:34+00:00
No time to make a meal between work shifts? Why not whip...
food bores me 2014-07-02T17:02:04+00:00
Newsletter #2 - Perchance to Dream
Neovim 2014-07-04T00:00:00+00:00
Frankenphone
How to Spot a Psychopath 2014-07-06T02:32:28+00:00
My Game Boy music maker, Bleep, has come a long way already! You...
i make video games 2014-07-06T19:28:00+00:00
I made this little jingle in Bleep. Not sure what it’s...
i make video games 2014-07-08T01:40:38+00:00
This is the battle screen for an RPG I was once making as a...
i make video games 2014-07-18T02:27:36+00:00
cutie witch girl (thing I never finished)
i make video games 2014-07-18T03:56:29+00:00
Hey, your bleep music creator sounds very awesome, I would definitely like to try it out some day. The main plus is that it looks way more simple than LSDJ, the only minus I see for now is short length of tracks (only about 3 minutes) - a lot of compositions are longer than that ;o I hope it will have a possibility to make longer tracks in the future :) Keep up the good work!
i make video games 2014-07-18T04:26:06+00:00
Some ink doodles I did with my Pentel brush pen a while back
i make video games 2014-07-18T04:57:36+00:00
This is a map that I was making for an exploration sidescroller...
i make video games 2014-07-18T18:17:00+00:00
Here’s a GIF full of spooky warped faces that move...
i make video games 2014-07-18T19:24:00+00:00
Whoops, my earlier GIF was fixed. Need a bunch of optimizations for...
i make video games 2014-07-18T20:39:00+00:00
Hoplite 2.3 progress
Magma Fortress 2014-07-19T05:08:00+00:00
Revenants, a Metroid-style sidescroller I was making at one...
i make video games 2014-07-21T04:26:00+00:00
Game Boy Wavy Scanline Effect #2 (better quality:...
i make video games 2014-07-21T05:01:00+00:00
Hoplite 2.3 progress II
Magma Fortress 2014-07-25T13:12:00+00:00
Leaderboards and balance changes for Hoplite
Magma Fortress 2014-08-03T02:23:00+00:00
Dance my puppets
Evennia Devblog RSS Feed 2014-08-04T00:00:00+00:00
A Grim Fandango poster that I worked to death (no pun...
Zac Gorman 2014-08-04T14:51:48+00:00
Bug fix release for Hoplite
Magma Fortress 2014-08-06T09:54:00+00:00
Game Engine Black Books
Fabien Sanglard 2014-08-07T01:08:45+00:00
Let's compile like it's 1992
Fabien Sanglard 2014-08-10T01:08:45+00:00
Google wage fixing, 11-CV-02509-LHK, ORDER DENYING PLAINTIFFS' MOTION FOR PRELIMINARY APPROVAL OF SETTLEMENTS WITH ADOBE, APPLE, GOOGLE, AND INTEL
Dan Luu 2014-08-14T00:00:00+00:00
Verilog Won & VHDL Lost? — You Be The Judge!
Dan Luu 2014-08-14T00:00:00+00:00
here’s the thing. I hate wasting food. if you and I are in...
food bores me 2014-08-15T15:01:17+00:00
The Road to Alpha, Week 25 - Imperfect Knowledge
Citybound Devblog 2014-08-27T01:30:20+00:00
Magma Music
Magma Fortress 2014-08-27T14:30:00+00:00
The Road to Alpha, Week 26 - Commute & Competition
Citybound Devblog 2014-09-03T02:00:38+00:00
Cards are the Future
Magma Fortress 2014-09-04T12:57:00+00:00
Newsletter #3 - Better Late than Never
Neovim 2014-09-06T00:00:00+00:00
Another bug fix release for Hoplite
Magma Fortress 2014-09-06T03:28:00+00:00
The Road to Alpha, Week 27 - Front Lawn Freeway
Citybound Devblog 2014-09-09T22:18:34+00:00
Plogue livenes
Plogue R&D 2014-09-11T14:49:00+00:00
STILL is a game about your hometown. [DL] Design, Visual | Made...
winnie song 2014-09-13T05:09:00+00:00
THE FOUR is a 1-player strategy game. You play as one of four...
winnie song 2014-09-13T05:17:00+00:00
Do Something is a 4-player local multiplayer game where you work...
winnie song 2014-09-13T17:08:50+00:00
WHISTLEBLOWER - An AGS game about whistleblowing. You play as a...
winnie song 2014-09-13T17:08:51+00:00
Parallel|Stitch is a game inspired by Sophie Houlden’s...
winnie song 2014-09-13T17:08:53+00:00
LBVQ is a game that teaches binary very quickly. | Made...
winnie song 2014-09-13T17:08:55+00:00
THIEF is a game where you are waiting for the bus with a...
winnie song 2014-09-13T17:08:57+00:00
First Impressions is a game about meeting the in-laws for the...
winnie song 2014-09-13T17:09:03+00:00
ASAP is a game about scheduling your boss’s life on your...
winnie song 2014-09-13T17:09:05+00:00
LIGHTRAFT is a game played with a MIDI controller and keen...
winnie song 2014-09-13T17:16:00+00:00
DRUNKWALK is a game about calling it a night – Let your heavy...
winnie song 2014-09-13T17:17:00+00:00
Bloodsport is a game about being in the woods with a beast and a...
winnie song 2014-09-13T17:17:01+00:00
BOSSA game about the interaction between a player and a hostile,...
winnie song 2014-09-13T18:49:00+00:00
onipress: FREE preview of idrawnintendo and doublefine’s...
Zac Gorman 2014-09-15T19:05:43+00:00
14 rough ideas for SEGA t-shirts, done as an exercise. Maybe I...
Zac Gorman 2014-09-16T04:03:59+00:00
Another batch of SEGA t-shirt ideas. I had to complete the...
Zac Gorman 2014-09-16T20:58:30+00:00
The Road to Alpha, Week 28 - You Cut Me Off!
Citybound Devblog 2014-09-17T02:47:34+00:00
chipspeech Diary, Part 1
Plogue R&D 2014-09-18T17:14:00+00:00
I made this GIF of all the eversions from Eversion NES. (This is...
i make video games 2014-09-19T08:20:00+00:00
The Road to Alpha, Week 29 - Exciting Times
Citybound Devblog 2014-09-23T23:54:51+00:00
Slowly moving through town
Evennia Devblog RSS Feed 2014-10-02T00:00:00+00:00
The Road to Alpha, Week 30 - New Place!
Citybound Devblog 2014-10-07T23:05:24+00:00
The Road to Alpha, Week 31 - New Place, For Real!
Citybound Devblog 2014-10-15T00:06:22+00:00
Assembly v. intrinsics
Dan Luu 2014-10-19T00:00:00+00:00
Something New: Livestream Reviews
Citybound Devblog 2014-10-20T11:17:11+00:00
10/22 Livestream Review
Citybound Devblog 2014-10-23T18:15:44+00:00
Mandelbrot: The Game
Magma Fortress 2014-10-24T14:24:00+00:00
Horse Simulator
Magma Fortress 2014-10-25T06:13:00+00:00
The Battlestar Encyclopedia
Magma Fortress 2014-10-26T05:25:00+00:00
Minehunter
Magma Fortress 2014-10-27T13:21:00+00:00
Conway's Game of Slime Creatures
Magma Fortress 2014-10-28T09:03:00+00:00
Still Life
Magma Fortress 2014-10-29T04:24:00+00:00
10/31 Special Announcement
Citybound Devblog 2014-10-31T12:13:43+00:00
Caches: LRU v. random
Dan Luu 2014-11-03T00:00:00+00:00
Testing v. informal reasoning
Dan Luu 2014-11-03T00:00:00+00:00
CLWB and PCOMMIT
Dan Luu 2014-11-05T00:00:00+00:00
Newsletter #4 - Thanksvimming Day
Neovim 2014-11-07T00:00:00+00:00
Literature review on the benefits of static types
Dan Luu 2014-11-07T00:00:00+00:00
Prompt directory shortening
Arabesque 2014-11-07T09:13:47+00:00
Rust, Lifetimes, and Collections - Faultlore
Faultlore 2014-11-09T00:00:00+00:00
How often is the build broken?
Dan Luu 2014-11-10T00:00:00+00:00
The ol' Ball and Chain
Magma Fortress 2014-11-10T12:30:00+00:00
Speeding up this site by 50x
Dan Luu 2014-11-17T00:00:00+00:00
One week of bugs
Dan Luu 2014-11-18T00:00:00+00:00
The Road to Alpha, Week 36 - A Sign of Life
Citybound Devblog 2014-11-18T14:13:04+00:00
TF-IDF linux commits
Dan Luu 2014-11-24T00:00:00+00:00
Photo
Zac Gorman 2014-11-25T00:35:11+00:00
The Road to Alpha, Week 37 - Imaginary Progress
Citybound Devblog 2014-11-25T22:37:58+00:00
Zelda Wind Waker – Hyrule Travel Guide
Simonschreibt. 2014-11-26T20:49:40+00:00
Markets, discrimination, and "lowering the bar"
Dan Luu 2014-12-01T00:00:00+00:00
The Road to Alpha, Week 38 - Curve Control
Citybound Devblog 2014-12-03T02:15:53+00:00
Malloc tutorial
Dan Luu 2014-12-04T00:00:00+00:00
Forever Mining Print now available at Fangamer!
nimasprout - Art by Nicole Gustafsson 2014-12-09T02:11:00+00:00
Crafty Wonderland Colossal Holiday Show in Portland
nimasprout - Art by Nicole Gustafsson 2014-12-11T16:30:00+00:00
The Road to Alpha, Week 40 - Hyper-Active
Citybound Devblog 2014-12-16T22:37:43+00:00
Integer overflow checking cost
Dan Luu 2014-12-17T00:00:00+00:00
2014 in Review
Citybound Devblog 2014-12-25T17:37:31+00:00
A review of the Julia language
Dan Luu 2014-12-28T00:00:00+00:00
chipspeech Diary, Part 2
Plogue R&D 2014-12-29T22:47:00+00:00
BADBLOOD BADBLOOD is a deadly game of hide & seek. It is a...
winnie song 2014-12-31T17:25:00+00:00
New Year - New Solo at Gallery 1988
nimasprout - Art by Nicole Gustafsson 2015-01-02T16:33:00+00:00
Rei Ayanami – Inner eyes
Simonschreibt. 2015-01-07T12:35:29+00:00
Developer Diary #1: Where do you think you're going?
Citybound Devblog 2015-01-07T23:42:48+00:00
Developer Diary #2: Intersection soup
Citybound Devblog 2015-01-10T23:11:42+00:00
What's new in CPUs since the 80s?
Dan Luu 2015-01-11T00:00:00+00:00
Cute Frog! A fun little visual novel mockup I started on. GBC...
i make video games 2015-01-11T04:53:12+00:00
Developer Diary #3: The Struggle
Citybound Devblog 2015-01-12T15:13:57+00:00
Pop Terrariums at Gallery 1988
nimasprout - Art by Nicole Gustafsson 2015-01-14T20:17:00+00:00
Developer Diary #4: Traffic Anarchy
Citybound Devblog 2015-01-14T22:52:36+00:00
A HashMap in Rust - What's a HashMap? - Faultlore
Faultlore 2015-01-15T00:00:00+00:00
Building Django proxies and MUD libraries
Evennia Devblog RSS Feed 2015-01-19T00:00:00+00:00
Blog monetization
Dan Luu 2015-01-24T00:00:00+00:00
Snoopy Valentine - Official Print Release with Dark Hall Mansion
nimasprout - Art by Nicole Gustafsson 2015-01-28T18:08:00+00:00
Shell config subfiles
Arabesque 2015-01-29T11:01:09+00:00
BuildCraft History and Design
Posts on asie's blog 2015-01-29T23:00:00+00:00
CPU backdoors
Dan Luu 2015-02-03T00:00:00+00:00
AI doesn't have to be very good to displace humans
Dan Luu 2015-02-15T00:00:00+00:00
Goodhearting IQ, cholesterol, and tail latency
Dan Luu 2015-03-05T00:00:00+00:00
Developer Diary #5: Back to Business
Citybound Devblog 2015-03-05T22:29:50+00:00
Challenge Mode
Magma Fortress 2015-03-06T22:52:00+00:00
What happens when you load a URL?
Dan Luu 2015-03-07T00:00:00+00:00
Challenge Mode Progress
Magma Fortress 2015-03-09T11:21:00+00:00
Given that we spend little effort on testing, how should we test software?
Dan Luu 2015-03-10T00:00:00+00:00
Hoplite Challenge Mode is ready
Magma Fortress 2015-03-15T09:32:00+00:00
Postcard Correspondence opens at Gallery 1988 tonight!
nimasprout - Art by Nicole Gustafsson 2015-03-20T15:54:00+00:00
Developer Diary #6: Zoning, Struggling, Parceling
Citybound Devblog 2015-03-21T22:17:08+00:00
Reading citations is easier than most people think
Dan Luu 2015-03-29T00:00:00+00:00
New Prints for Spring/Summer.
nimasprout - Art by Nicole Gustafsson 2015-03-31T21:35:00+00:00
Developer Diary #7 - The Economic Model
Citybound Devblog 2015-04-01T14:11:52+00:00
Newsletter #5 - Out of the Box
Neovim 2015-04-03T00:00:00+00:00
CGA in 1024 Colors - a New Mode: the Illustrated Guide
int10h.org - VileR's blog 2015-04-15T20:56:27+00:00
Mouse Guard: Legends of the Guard Vol 3, #2
nimasprout - Art by Nicole Gustafsson 2015-04-19T17:56:00+00:00
Pre-Pooping Your Pants With Rust - Faultlore
Faultlore 2015-04-27T00:00:00+00:00
DevDiary #8 - Technical Background Work
Citybound Devblog 2015-04-27T16:17:39+00:00
Photo
♘ 2015-04-29T09:30:09+00:00
Photo
♘ 2015-04-30T09:30:14+00:00
Photo
♘ 2015-05-03T12:52:08+00:00
We used to build steel mills near cheap power. Now that's where we build datacenters
Dan Luu 2015-05-04T00:00:00+00:00
Crafty Wonderland Colossial Spring Sale this May 9th
nimasprout - Art by Nicole Gustafsson 2015-05-07T03:55:00+00:00
Documenting Python without Sphinx
Evennia Devblog RSS Feed 2015-05-09T00:00:00+00:00
Things goin on
Evennia Devblog RSS Feed 2015-05-11T00:00:00+00:00
Crafty Wonderland Recap
nimasprout - Art by Nicole Gustafsson 2015-05-12T20:39:00+00:00
Haunted Depths - New Print at Tiny Showcase
nimasprout - Art by Nicole Gustafsson 2015-05-13T16:46:00+00:00
Photo
♘ 2015-05-16T16:32:28+00:00
Advantages of monorepos
Dan Luu 2015-05-17T00:00:00+00:00
Challenge Mode Comes to Android and iOS
Magma Fortress 2015-05-18T10:11:00+00:00
A defense of boring languages
Dan Luu 2015-05-25T00:00:00+00:00
The googlebot monopoly
Dan Luu 2015-05-27T00:00:00+00:00
Dreaming big?
Evennia Devblog RSS Feed 2015-05-30T00:00:00+00:00
Slashdot and Sourceforge
Dan Luu 2015-05-31T00:00:00+00:00
Rust, Generics, and Collections - Faultlore
Faultlore 2015-06-03T00:00:00+00:00
Rust Collections Case Study: BTreeMap - Faultlore
Faultlore 2015-06-05T00:00:00+00:00
Photo
♘ 2015-06-08T15:05:11+00:00
June 2015 Update (Mystery Feature)
Citybound Devblog 2015-06-09T13:27:12+00:00
Photo
♘ 2015-06-10T10:45:03+00:00
Fantastical Flora and Fauna exhbit at Gallery Nucleus
nimasprout - Art by Nicole Gustafsson 2015-06-12T00:16:00+00:00
Need your help!
Evennia Devblog RSS Feed 2015-06-15T00:00:00+00:00
Artwork from Fantastical Fauna and Flora at Gallery Nucleus
nimasprout - Art by Nicole Gustafsson 2015-06-15T15:35:00+00:00
The Road to Alpha, Week 66 - More on Planning Mode
Citybound Devblog 2015-06-20T01:42:35+00:00
Announcing the Evennia example-game project "Ainneve"
Evennia Devblog RSS Feed 2015-06-22T00:00:00+00:00
Recent Shows at iam8bit Gallery in LA
nimasprout - Art by Nicole Gustafsson 2015-06-24T17:05:00+00:00
Out and about at Mt. Rainier National Forest
nimasprout - Art by Nicole Gustafsson 2015-07-06T15:20:00+00:00
Discrete Arctan in 6502
dustmop.io blog 2015-07-22T15:18:43+00:00
Sacred 2 – Fake Mirror
Simonschreibt. 2015-07-23T00:11:59+00:00
Bag Review: National Geographic A2540
Steve Losh 2015-07-24T18:42:00+00:00
Bag Review: National Geographic MC5350
Steve Losh 2015-07-26T13:35:00+00:00
Photo
♘ 2015-07-30T15:10:50+00:00
Batsly Adams – Star Versus Production
dustmop.io blog 2015-07-31T18:21:23+00:00
New Design
int10h.org - VileR's blog 2015-08-03T05:14:28+00:00
Common ain't no language I ever heard of!
Smerg Development Journal 2015-08-04T17:32:32+00:00
This modern world
Smerg Development Journal 2015-08-05T16:47:47+00:00
8088 MPH Final: Old vs. New CGA (and Other Gory Details)
int10h.org - VileR's blog 2015-08-07T00:39:06+00:00
Resources, Part II
Smerg Development Journal 2015-08-08T16:38:21+00:00
August 2015 Update - A week with Michael
Citybound Devblog 2015-08-09T10:25:14+00:00
101 Monochrome Mazes: Why Not Color?
int10h.org - VileR's blog 2015-08-09T13:31:32+00:00
untitled
Smerg Development Journal 2015-08-14T17:15:34+00:00
Skilling it up
Smerg Development Journal 2015-08-15T07:42:11+00:00
Render Hell – Book V
Simonschreibt. 2015-08-16T17:00:36+00:00
Render Hell – Book IV
Simonschreibt. 2015-08-16T17:01:25+00:00
Render Hell – Book III
Simonschreibt. 2015-08-16T17:02:55+00:00
Render Hell – Book II
Simonschreibt. 2015-08-16T17:03:20+00:00
Render Hell – Book I
Simonschreibt. 2015-08-16T17:04:42+00:00
Render Hell 2.0
Simonschreibt. 2015-08-16T17:05:21+00:00
Ghost in the Finite State Machine
Smerg Development Journal 2015-08-16T17:13:36+00:00
Photo
♘ 2015-08-17T17:15:27+00:00
Blocking blocks block path! We go NOWHERE!
Smerg Development Journal 2015-08-19T19:05:26+00:00
Reading postmortems
Dan Luu 2015-08-20T00:00:00+00:00
Photo
♘ 2015-08-25T09:38:37+00:00
Photo
♘ 2015-08-25T09:38:47+00:00
A wagon load of post summer updates
Evennia Devblog RSS Feed 2015-08-27T00:00:00+00:00
Accounting Department
Smerg Development Journal 2015-08-27T21:10:57+00:00
Steve Yegge's prediction record
Dan Luu 2015-08-31T00:00:00+00:00
atonal 2015
@mntmn 2015-09-05T13:50:05+00:00
atonal 2015
@mntmn 2015-09-05T13:52:07+00:00
atonal 2015
@mntmn 2015-09-05T13:53:27+00:00
heart and penis sprites we made on commodore 128 in BASIC
@mntmn 2015-09-05T13:55:10+00:00
some breakbeats in ohm that i liked
@mntmn 2015-09-05T13:56:30+00:00
uridium 2 intro on amiga 1200
@mntmn 2015-09-05T13:58:15+00:00
meganalicerose: When your shoes match your leggings 💁...
@mntmn 2015-09-07T07:46:56+00:00
Interim OS running on Interim computer prototype. (Details at...
@mntmn 2015-09-09T13:53:40+00:00
Hold the RESET button while turning the power off
Smerg Development Journal 2015-09-09T18:16:38+00:00
untitled
Smerg Development Journal 2015-09-09T19:19:58+00:00
Re-Re-Revisiting Skills
Smerg Development Journal 2015-09-13T19:47:50+00:00
Lightbulb over head
Smerg Development Journal 2015-09-14T18:40:37+00:00
Out and about at Cannon Beach, Oregon
nimasprout - Art by Nicole Gustafsson 2015-09-15T19:46:00+00:00
Photo
@mntmn 2015-09-15T22:19:29+00:00
More ideas regarding Exits
Smerg Development Journal 2015-09-16T07:15:30+00:00
Photo
♘ 2015-09-17T18:24:09+00:00
Changed changes of changing
Smerg Development Journal 2015-09-18T18:17:02+00:00
One step back, two steps forward
Smerg Development Journal 2015-09-21T17:55:44+00:00
New Shop!
nimasprout - Art by Nicole Gustafsson 2015-09-21T18:40:00+00:00
The Road to Alpha, Week 89 - Theory and Practice
Citybound Devblog 2015-09-23T03:40:24+00:00
ALL the resources!
Smerg Development Journal 2015-09-23T04:13:25+00:00
Pushing through a straw
Evennia Devblog RSS Feed 2015-09-24T00:00:00+00:00
Photo
Infraspace 2015-09-26T06:10:32+00:00
Photo
Infraspace 2015-09-26T06:18:44+00:00
Oh, right, that.
Smerg Development Journal 2015-09-26T06:28:54+00:00
Enter the new Exits
Smerg Development Journal 2015-09-28T19:36:46+00:00
Evennia on `podcast.__init__`
Evennia Devblog RSS Feed 2015-09-29T00:00:00+00:00
Slowlock
Dan Luu 2015-09-30T00:00:00+00:00
Video
@mntmn 2015-09-30T09:10:09+00:00
Trust your technolust.(Get a Cyberdelia sticker to go with your...
Cyberdelia NYC 2015-10-01T02:15:21+00:00
All the shows I forgot to post.
nimasprout - Art by Nicole Gustafsson 2015-10-01T16:25:00+00:00
Emoting System
Evennia Devblog RSS Feed 2015-10-02T00:00:00+00:00
Another quick fun idea
Smerg Development Journal 2015-10-03T05:01:48+00:00
Why Intel added cache partitioning
Dan Luu 2015-10-04T00:00:00+00:00
Watchdog – Problems
Simonschreibt. 2015-10-04T14:01:43+00:00
Watchdog – Gallery
Simonschreibt. 2015-10-04T14:02:24+00:00
Watchdog – Mail
Simonschreibt. 2015-10-04T14:03:27+00:00
Watchdog – Compare
Simonschreibt. 2015-10-04T14:04:07+00:00
Watchdog – Convert
Simonschreibt. 2015-10-04T14:06:58+00:00
Watchdog – Take Screenshots
Simonschreibt. 2015-10-04T14:07:15+00:00
Watchdog – Prepare your Game
Simonschreibt. 2015-10-04T14:08:10+00:00
Watchdog – Structure
Simonschreibt. 2015-10-04T14:09:21+00:00
Watchdog Script
Simonschreibt. 2015-10-04T14:10:01+00:00
Diablo Gate
Simonschreibt. 2015-10-08T09:27:26+00:00
“Never send a boy to do a woman’s job.”–Acid BurnNice initial...
Cyberdelia NYC 2015-10-08T20:08:57+00:00
more dogs have been to space than people who genuinely love you
@mntmn 2015-10-09T09:43:26+00:00
Illustrations and soaps
Evennia Devblog RSS Feed 2015-10-11T00:00:00+00:00
How do computers have a sense of time?
@mntmn 2015-10-11T10:57:27+00:00
inblack-wetrust: Undercover ss2014
@mntmn 2015-10-12T14:57:00+00:00
Open Assets via Text
Simonschreibt. 2015-10-12T21:39:45+00:00
Photo
@mntmn 2015-10-12T23:39:50+00:00
Halloween Prints and more!
nimasprout - Art by Nicole Gustafsson 2015-10-13T17:59:00+00:00
Meanwhile in another dimension...
Smerg Development Journal 2015-10-14T05:55:48+00:00
It's Aliiiiiive...
daftmike's blog 2015-10-16T07:15:00+00:00
Pumpkin Grove Print Set
nimasprout - Art by Nicole Gustafsson 2015-10-16T16:28:00+00:00
Teen Who Hacked CIA Director’s Email Tells How He Did...
Cyberdelia NYC 2015-10-21T17:05:07+00:00
X:Rebirth – Geometric Lensflares
Simonschreibt. 2015-10-23T19:52:16+00:00
eightninea: Moogfest —
@mntmn 2015-10-25T13:50:10+00:00
CTC Bizer Duplicator... My new 3D printer
daftmike's blog 2015-10-27T11:40:00+00:00
Photo
♘ 2015-10-27T12:39:03+00:00
there is no problem for which X11 forwarding is the correct solution
@mntmn 2015-10-28T00:30:02+00:00
this computer has an identity crisis
@mntmn 2015-10-29T23:04:18+00:00
Infinite disk
Dan Luu 2015-11-01T00:00:00+00:00
Photo
@mntmn 2015-11-01T21:48:10+00:00
Little Blue Box - Jobs And Wozniak on Phone Phreaking “Before...
Cyberdelia NYC 2015-11-05T17:46:11+00:00
Mystery Toronto Artist Gives Payphones a Makeoverincluding a...
Cyberdelia NYC 2015-11-06T21:09:17+00:00
Getting Optimal Apple ][ Screenshots w/NTSC Emulation
int10h.org - VileR's blog 2015-11-08T14:04:54+00:00
MIT uses Evennia!
Evennia Devblog RSS Feed 2015-11-12T00:00:00+00:00
The Road to Alpha, Week 96 - Committing to ...
Citybound Devblog 2015-11-12T01:12:21+00:00
Fallout 4 – Wasteland Eyes
Simonschreibt. 2015-11-17T13:52:45+00:00
Happy Little Words
Steve Losh 2015-11-20T18:43:00+00:00
What's worked in Computer Science: 1999 v. 2015
Dan Luu 2015-11-23T00:00:00+00:00
Photo
♘ 2015-11-23T21:07:53+00:00
What It Was Like When They Filmed Hackers At My High School
Cyberdelia NYC 2015-11-24T13:41:49+00:00
Why use ECC?
Dan Luu 2015-11-27T00:00:00+00:00
Trying out Beam.pro
Citybound Devblog 2015-11-27T01:21:14+00:00
Okay. Let’s go shopping.DADEI’ll hack the...
Cyberdelia NYC 2015-11-27T15:56:48+00:00
Photo
♘ 2015-11-27T17:07:05+00:00
Michael left Citybound
Citybound Devblog 2015-11-27T22:25:08+00:00
system
@mntmn 2015-11-28T14:50:11+00:00
Just Beat the Data Out of It
Steve Losh 2015-11-30T16:10:00+00:00
Spotted in the wild. “Okay. Let’s go...
Cyberdelia NYC 2015-11-30T18:30:42+00:00
Hackers Oral History: How Did This Get Made
Cyberdelia NYC 2015-11-30T21:00:36+00:00
My first 3D design...
daftmike's blog 2015-12-01T08:36:00+00:00
Famed for Tango and Hackers
Cyberdelia NYC 2015-12-01T14:02:03+00:00
Photo
♘ 2015-12-01T19:05:49+00:00
Photo
♘ 2015-12-01T19:07:05+00:00
What is Color Banding? And what is it not?
Simonschreibt. 2015-12-02T19:00:20+00:00
Photo
♘ 2015-12-04T14:49:44+00:00
Braid – Respect the Rules
Simonschreibt. 2015-12-07T11:24:40+00:00
The winding, telephonic odyssey of Joybubbles, the original phone phreak
Cyberdelia NYC 2015-12-07T19:24:39+00:00
My BLT drive on my computer just went...
Cyberdelia NYC 2015-12-08T13:22:16+00:00
Newsletter #6 - Ship it!
Neovim 2015-12-09T00:00:00+00:00
What the Hell are Permutation Patterns?
Steve Losh 2015-12-10T19:55:00+00:00
I'm participating in a game jam this weekend
Citybound Devblog 2015-12-10T23:22:01+00:00
Files are hard
Dan Luu 2015-12-12T00:00:00+00:00
Ludum Dare 34 Postmortem
Steve Losh 2015-12-15T16:30:00+00:00
A summary of a year
Evennia Devblog RSS Feed 2015-12-17T00:00:00+00:00
Big companies v. startups
Dan Luu 2015-12-17T00:00:00+00:00
BigBlue Terminal: An Oldschool Fixed-Width Pixel Font
int10h.org - VileR's blog 2015-12-18T16:51:40+00:00
NES Graphics – Part 3
dustmop.io blog 2015-12-18T18:00:31+00:00
What RESTful actually means
Code Words 2015-12-19T09:00:00+00:00
Fallout 4 – The Mushroom Case
Simonschreibt. 2015-12-23T01:46:27+00:00
How to trick a neural network into thinking a panda is a vulture
Code Words 2015-12-23T09:00:00+00:00
cyberdelianyc: What, your mom buy you a ‘Puter for Christmas?...
Cyberdelia NYC 2015-12-25T13:01:38+00:00
Normalization of deviance
Dan Luu 2015-12-29T00:00:00+00:00
New Year prediction…(for 1996?)Kate: RISC architecture is...
Cyberdelia NYC 2016-01-04T20:34:38+00:00
Solo show premiering at Gallery 1988 (East)
nimasprout - Art by Nicole Gustafsson 2016-01-05T05:12:00+00:00
LinkNYC public Wi-Fi Finally Getting Installed New York is...
Cyberdelia NYC 2016-01-05T13:57:18+00:00
Delayed Reference Method
Simonschreibt. 2016-01-05T16:47:05+00:00
A Promising 2016
Citybound Devblog 2016-01-06T23:35:09+00:00
Windows (and ClearType) vs. Truetype Fonts with Embedded Bitmaps
int10h.org - VileR's blog 2016-01-07T21:11:23+00:00
I make a Craft(Friends) check...
Smerg Development Journal 2016-01-08T17:02:11+00:00
We saw some really bad Intel CPU bugs in 2015 and we should expect to see more in the future
Dan Luu 2016-01-10T00:00:00+00:00
Photo
Cyberdelia NYC 2016-01-11T01:58:45+00:00
Banned of Brothers
Smerg Development Journal 2016-01-12T17:11:52+00:00
Diablo 3 – The sacred spiderweb
Simonschreibt. 2016-01-14T18:46:35+00:00
The Once and Future Weird Kids at Gallery 1988
nimasprout - Art by Nicole Gustafsson 2016-01-16T20:28:00+00:00
The Ultimate Oldschool PC Font Pack (v1.0)
int10h.org - VileR's blog 2016-01-16T22:07:06+00:00
Experiments with toner transfer...
daftmike's blog 2016-01-19T14:00:00+00:00
Alpha 1 – My Top 5 Usecases
Simonschreibt. 2016-01-22T15:31:12+00:00
Sampling v. tracing
Dan Luu 2016-01-24T00:00:00+00:00
Niagara calls
How to Spot a Psychopath 2016-01-30T05:38:21+00:00
Bill Gates Hacked His High School’s Computers to Be Placed in...
Cyberdelia NYC 2016-02-01T21:02:57+00:00
After just two years, I'm starting properly!
Citybound Devblog 2016-02-01T22:03:26+00:00
Blubb! – Fish Tanks in Games
Simonschreibt. 2016-02-02T21:09:12+00:00
Scanning for confidential information on external web servers
The Grymoire 2016-02-06T16:50:53+00:00
Diablo 3 – Wings of Angels
Simonschreibt. 2016-02-11T13:44:24+00:00
Photo
♘ 2016-02-11T17:57:59+00:00
Climbing up Branches
Evennia Devblog RSS Feed 2016-02-14T00:00:00+00:00
The long path of player generation
Smerg Development Journal 2016-02-15T09:22:47+00:00
A monumental day
Smerg Development Journal 2016-02-19T02:10:36+00:00
Terrain Generation with Midpoint Displacement
Steve Losh 2016-02-19T19:45:00+00:00
Dark Maus – Top Down Trees
Simonschreibt. 2016-02-24T16:06:04+00:00
So-called "IBM" Freeware Games from the Early '80s
int10h.org - VileR's blog 2016-02-26T09:02:11+00:00
Harry Potter and the Methods of Rationality review by su3su2u1
Dan Luu 2016-03-01T00:00:00+00:00
su3su2u1 physics tumblr archive
Dan Luu 2016-03-01T00:00:00+00:00
v1.2! never stop! #amiga
@mntmn 2016-03-01T18:55:01+00:00
Alien vs Wolfenstein – Cutting Torch
Simonschreibt. 2016-03-02T20:01:25+00:00
This just isn't functional
Code Words 2016-03-07T12:00:00+00:00
Recursive Midpoint Displacement
Steve Losh 2016-03-07T13:45:00+00:00
Image Processing 101
Code Words 2016-03-10T09:00:00+00:00
Lotus Text
dustmop.io blog 2016-03-10T17:43:10+00:00
Telling stories with data using the grammar of graphics
Code Words 2016-03-16T10:00:00+00:00
A Music Update From Dane
Citybound Devblog 2016-03-18T13:26:04+00:00
We only hire the trendiest
Dan Luu 2016-03-21T07:23:44+00:00
Olympiad: IBM Prototype Fonts Unearthed
int10h.org - VileR's blog 2016-03-22T22:36:46+00:00
Technical stuff happening
Evennia Devblog RSS Feed 2016-03-24T00:00:00+00:00
Immutability is not enough
Code Words 2016-03-29T10:00:00+00:00
Thermoelectric Drinks-Can Cooler
daftmike's blog 2016-03-30T18:22:00+00:00
Now in German: Eine kleine Statusberichterstattung
Citybound Devblog 2016-04-01T22:06:02+00:00
April Fools!
Citybound Devblog 2016-04-02T20:56:50+00:00
Google SRE book
Dan Luu 2016-04-11T08:00:58+00:00
How I'm getting along
Citybound Devblog 2016-04-18T00:17:49+00:00
Some programming blogs to consider reading
Dan Luu 2016-04-18T07:06:34+00:00
The Secrets of Medieval Fonts
medievalbooks 2016-04-29T10:48:27+00:00
"Celestial Spaces" opening at Flatcolor Gallery
nimasprout - Art by Nicole Gustafsson 2016-05-02T16:08:00+00:00
Cron best practices
Arabesque 2016-05-08T05:19:19+00:00
Evennia 0.6!
Evennia Devblog RSS Feed 2016-05-22T00:00:00+00:00
Dopefish goes NTSC: Commander Keen 4 Composite CGA Patch Notes
int10h.org - VileR's blog 2016-05-28T23:40:51+00:00
Background: A Tale of Two Worlds
Citybound Devblog 2016-05-29T19:11:38+00:00
Evennia in Pictures
Evennia Devblog RSS Feed 2016-05-31T00:00:00+00:00
The Wit.nes (demo)
dustmop.io blog 2016-06-03T17:01:56+00:00
Shifts in the blogging tide
Article on Coyote Cartography 2016-06-19T00:41:39+00:00
Terrain Generation with Diamond Square
Steve Losh 2016-06-27T13:35:00+00:00
What the Hell is Symbolic Computation?
Steve Losh 2016-06-29T13:30:00+00:00
The art of sharing nicks and descriptions
Evennia Devblog RSS Feed 2016-07-01T00:00:00+00:00
The Joy of VFX – Pintable
Simonschreibt. 2016-07-10T18:47:19+00:00
Background: An Architecture for Millions of Things
Citybound Devblog 2016-07-13T20:37:25+00:00
Yet another 16-color CGA makeover: Keen 5
int10h.org - VileR's blog 2016-07-17T23:08:51+00:00
Keen 4 Mystery Code Demystified
int10h.org - VileR's blog 2016-07-17T23:32:37+00:00
Adventures in SSL
Smerg Development Journal 2016-07-23T21:53:33+00:00
NESPi - my Mini NES Classic Raspberry Pi games console
daftmike's blog 2016-07-27T20:00:00+00:00
Mini NES Classic Updates
daftmike's blog 2016-08-01T15:42:00+00:00
Photo
♘ 2016-08-04T08:19:59+00:00
Slides: Demystifying Demakes
dustmop.io blog 2016-08-04T19:48:37+00:00
Notes on concurrency bugs
Dan Luu 2016-08-05T03:32:26+00:00
August 2016 Lisp Game Jam Postmortem
Steve Losh 2016-08-15T13:45:00+00:00
Happy 35th birthday, IBM PC!
int10h.org - VileR's blog 2016-08-18T13:28:29+00:00
Look at a computer chip up close and it almost looks like an...
Cyberdelia NYC 2016-08-18T20:04:48+00:00
Playing With Syntax
Steve Losh 2016-08-19T13:15:00+00:00
The Elegance of Deflate
codersnotes.com 2016-08-21T07:00:00+00:00
The Multi-Project Programmer
codersnotes.com 2016-08-26T07:00:00+00:00
The Metaprogrammer
codersnotes.com 2016-09-06T07:00:00+00:00
Learning To Wrangle Half-Floats
codersnotes.com 2016-09-10T07:00:00+00:00
How I learned to program
Dan Luu 2016-09-12T08:41:26+00:00
Debunking Euclideon's Unlimited Detail Tech
codersnotes.com 2016-09-13T07:00:00+00:00
Celebrating 21 years
Cyberdelia NYC 2016-09-15T15:13:14+00:00
the 7th hacker
Cyberdelia NYC 2016-09-15T15:16:04+00:00
Customizing Common Lisp's Iterate: Averaging
Steve Losh 2016-09-20T13:45:00+00:00
Weekly Programming Challenge #9
The Buckblog 2016-09-24T06:00:00+00:00
The last weeks were mostly spent with improving the tools and...
DeathTrash 2016-09-24T11:17:52+00:00
Is dev compensation bimodal?
Dan Luu 2016-09-27T06:33:26+00:00
More characters. (Little diversion from all that tools...
DeathTrash 2016-09-29T12:26:08+00:00
Weekly Programming Challenge #10
The Buckblog 2016-10-01T06:00:00+00:00
Untonemapping, and other stupid tricks
codersnotes.com 2016-10-02T07:00:00+00:00
I could do that in a weekend!
Dan Luu 2016-10-03T08:14:27+00:00
The Challenge Of Making Things
codersnotes.com 2016-10-04T07:00:00+00:00
Weekly Programming Challenge #11
The Buckblog 2016-10-08T06:00:00+00:00
All that work on the Level Editor is finally paying off and...
DeathTrash 2016-10-09T07:04:35+00:00
Hiring and the market for lemons
Dan Luu 2016-10-09T09:44:14+00:00
Customizing Common Lisp's Iterate: Timing
Steve Losh 2016-10-10T14:50:00+00:00
Data driven literary analysis
Code Words 2016-10-11T12:00:00+00:00
A tour of random forests
Code Words 2016-10-11T12:00:00+00:00
A history of storage media
Code Words 2016-10-11T12:00:00+00:00
Season of fixes
Evennia Devblog RSS Feed 2016-10-13T00:00:00+00:00
Weekly Programming Challenge #12
The Buckblog 2016-10-15T06:00:00+00:00
The Illusion Of Controls
codersnotes.com 2016-10-15T07:00:00+00:00
Programming book recommendations and anti-recommendations
Dan Luu 2016-10-16T08:06:34+00:00
“How’s it hanging?”Animations halfway done on this one. Need...
DeathTrash 2016-10-17T17:00:54+00:00
New Paintings Featuring Spell Cats
nimasprout - Art by Nicole Gustafsson 2016-10-18T17:19:00+00:00
UI improvements.
DeathTrash 2016-10-19T10:00:55+00:00
Custom commands
Arabesque 2016-10-22T05:37:44+00:00
Weekly Programming Challenge #13
The Buckblog 2016-10-22T06:00:00+00:00
HN: the good parts
Dan Luu 2016-10-23T00:00:00+00:00
Weekly Programming Challenge #14
The Buckblog 2016-10-29T06:00:00+00:00
Newsletter #7 - Summer of Road
Neovim 2016-11-01T00:00:00+00:00
Stormy Weather Arts Festival at Cannon Beach, Oregon.
nimasprout - Art by Nicole Gustafsson 2016-11-04T14:00:00+00:00
Weekly Programming Challenge #15
The Buckblog 2016-11-05T06:00:00+00:00
Working on combat this week and it’s beginning to feel a lot...
DeathTrash 2016-11-05T12:35:00+00:00
Photo
♘ 2016-11-08T13:19:31+00:00
Weekly Programming Challenge #16
The Buckblog 2016-11-12T07:00:00+00:00
"There is another challenge we must address – and it is the corrupting force of the vast sums of..."
LESSIG Blog 2016-11-13T06:24:47+00:00
Help me express the relative presidential voting power?
LESSIG Blog 2016-11-13T07:08:57+00:00
Worked on a lot of gameplay related things in the last week. For...
DeathTrash 2016-11-13T17:32:10+00:00
https://soundcloud.com/lessig/epstein-and-lessig-on-public-fundin...
LESSIG Blog 2016-11-14T06:18:49+00:00
One person, one vote? yea, right. The corruption that is the Electoral College
LESSIG Blog 2016-11-17T06:00:32+00:00
Is there a “who’s here” app?
LESSIG Blog 2016-11-17T09:21:37+00:00
Worked mostly on the framework this week. It’s now at a...
DeathTrash 2016-11-18T16:51:46+00:00
Weekly Programming Challenges -- Recap
The Buckblog 2016-11-19T07:00:00+00:00
Beating The Compiler
codersnotes.com 2016-11-28T08:00:00+00:00
Birthday retrospective
Evennia Devblog RSS Feed 2016-11-30T00:00:00+00:00
People, not acres, should count in a democracy. (And please...
LESSIG Blog 2016-11-30T09:03:18+00:00
Birthday retrospective
Griatch's Evennia musings 2016-11-30T14:38:00+00:00
Sneaking around.
DeathTrash 2016-12-02T12:24:51+00:00
Learning Via Bullshit
codersnotes.com 2016-12-06T08:00:00+00:00
Bubbles, Baseball, and Mr. Marsh
Article on Coyote Cartography 2016-12-07T20:15:48+00:00
So I’ve had my first “zero-carbon-footprint-you” threat
LESSIG Blog 2016-12-10T02:35:08+00:00
Assassin’s Creed: Black Flag – Waterplane
Simonschreibt. 2016-12-14T08:55:42+00:00
Form over frolic: Jony Ive’s quest for boring perfection
Article on Coyote Cartography 2016-12-16T18:30:52+00:00
Converging Towards Disneyland
codersnotes.com 2016-12-19T08:00:00+00:00
CHIP-8 in Common Lisp: The CPU
Steve Losh 2016-12-19T17:45:00+00:00
Point of view.
DeathTrash 2016-12-20T15:01:10+00:00
CHIP-8 in Common Lisp: Graphics
Steve Losh 2016-12-21T16:55:00+00:00
Process Roulette
The Buckblog 2016-12-23T07:00:00+00:00
CHIP-8 in Common Lisp: Input
Steve Losh 2016-12-23T16:00:00+00:00
Christmas 2016 Announcement
Citybound Devblog 2016-12-25T01:54:31+00:00
CHIP-8 in Common Lisp: Sound
Steve Losh 2016-12-26T17:30:00+00:00
It was cast in stone and iron so that it could not further...
DeathTrash 2016-12-27T21:57:11+00:00
Game #63: Dragon Slayer: The Legend of Heroes (TurboGrafx-CD) - It's In Your Hands Now (Finished)
The RPG Consoler 2016-12-30T00:06:00+00:00
Game #64: Exile (TurboGrafx-CD) - Another Arabian Night (Finished)
The RPG Consoler 2017-01-01T02:34:00+00:00
Below the Cut: Spiritual Warfare (NES, Genesis, Game Boy)
The RPG Consoler 2017-01-02T08:08:00+00:00
CHIP-8 in Common Lisp: Disassembly
Steve Losh 2017-01-02T17:15:00+00:00
Game #65: Soul Blazer (SNES) - Restore the World (Finished)
The RPG Consoler 2017-01-03T02:18:00+00:00
CHIP-8 in Common Lisp: Debugging Infrastructure
Steve Losh 2017-01-05T16:40:00+00:00
Rest Well 1992; Welcome 1993
The RPG Consoler 2017-01-07T21:19:00+00:00
What I did
Citybound Devblog 2017-01-09T20:19:37+00:00
Game #66: Ultima: Warriors of Destiny (NES) - Promised Destiny
The RPG Consoler 2017-01-10T04:23:00+00:00
CHIP-8 in Common Lisp: Menus
Steve Losh 2017-01-10T16:20:00+00:00
What I did
Citybound Devblog 2017-01-10T16:21:19+00:00
Building a Teensy 3.2 w/SD and 8 position DIP switch + Reset button
The Grymoire 2017-01-11T20:02:03+00:00
What I did
Citybound Devblog 2017-01-11T22:17:38+00:00
Disassembling Jak & Daxter
codersnotes.com 2017-01-12T08:00:00+00:00
What I did
Citybound Devblog 2017-01-12T21:37:12+00:00
What happened
Citybound Devblog 2017-01-13T22:47:05+00:00
What I did
Citybound Devblog 2017-01-16T22:21:15+00:00
Game #66: Ultima: Warriors of Destiny (NES) - Warriors Rushed (Finished)
The RPG Consoler 2017-01-17T04:51:00+00:00
What I did
Citybound Devblog 2017-01-18T22:36:48+00:00
What I did
Citybound Devblog 2017-01-19T17:41:32+00:00
What I did
Citybound Devblog 2017-01-20T16:33:51+00:00
What I did
Citybound Devblog 2017-01-23T22:03:48+00:00
Game #67: Gauntlet IV (Genesis) - Dragons All The Way Up (Finished)
The RPG Consoler 2017-01-24T04:15:00+00:00
What I did
Citybound Devblog 2017-01-25T22:09:04+00:00
What I did
Citybound Devblog 2017-01-27T22:11:45+00:00
Worked mostly on tools in the past weeks. They are now at an...
DeathTrash 2017-01-28T23:33:35+00:00
What I did
Citybound Devblog 2017-01-30T21:18:39+00:00
Below the Cut: LandStalker (Genesis)
The RPG Consoler 2017-01-31T16:21:00+00:00
What I did
Citybound Devblog 2017-01-31T22:16:03+00:00
Release: Citybound 0.1.1 & 0.1.2
Citybound Devblog 2017-02-02T00:08:26+00:00
News items from the new year
Evennia Devblog RSS Feed 2017-02-05T00:00:00+00:00
News items from the new year
Griatch's Evennia musings 2017-02-05T12:21:00+00:00
January '17 Review & February Plans
Citybound Devblog 2017-02-07T19:55:43+00:00
How web bloat impacts users with slow connections
Dan Luu 2017-02-08T00:00:00+00:00
Bash hostname completion
Arabesque 2017-02-10T10:32:17+00:00
“Nope.”
DeathTrash 2017-02-10T12:42:44+00:00
What I thought about
Citybound Devblog 2017-02-14T19:04:18+00:00
Building NES homebrew with makechr.exe
dustmop.io blog 2017-02-16T18:53:56+00:00
What I thought about
Citybound Devblog 2017-02-17T18:49:08+00:00
Shell from vi
Arabesque 2017-02-18T10:46:56+00:00
Mother is waiting.
DeathTrash 2017-02-19T07:32:05+00:00
What I did
Citybound Devblog 2017-02-22T22:08:34+00:00
Dragon taming with Tailbiter, a bytecode compiler for Python
Code Words 2017-02-23T12:00:00+00:00
Game #68: Inindo: Way of the Ninja (SNES) - The Wrong Way
The RPG Consoler 2017-02-23T16:52:00+00:00
Turns out it's difficult
Citybound Devblog 2017-02-27T02:08:39+00:00
Eureka?
Citybound Devblog 2017-02-28T23:25:43+00:00
Game #68: Inindo: Way of the Ninja (SNES) - Gearing Up
The RPG Consoler 2017-03-02T05:00:00+00:00
Down the Rabbit Hole
Citybound Devblog 2017-03-07T22:56:20+00:00
Beautiful new words to describe obscure emotions
The Dictionary of Obscure Sorrows 2017-03-10T16:29:27+00:00
Let the Battle Begin
Citybound Devblog 2017-03-14T11:05:29+00:00
A Glimpse of Game and Interaction Design in Citybound
Citybound Devblog 2017-03-14T22:28:20+00:00
What I did the last days
Citybound Devblog 2017-03-19T23:42:50+00:00
LetsEncrypt + Amazon EC2 = SSLLabs A Rating
The Grymoire 2017-03-24T15:20:28+00:00
Moving countries again...
Citybound Devblog 2017-03-27T22:23:36+00:00
With every click of the shutter,you’re trying to press...
The Dictionary of Obscure Sorrows 2017-03-28T14:05:11+00:00
Water shader.
DeathTrash 2017-04-01T15:45:50+00:00
What I did the last days
Citybound Devblog 2017-04-04T07:52:39+00:00
What I did today
Citybound Devblog 2017-04-07T00:09:35+00:00
Game #68: Inindo: Way of the Ninja (SNES) - Wandering Away (Finished)
The RPG Consoler 2017-04-10T17:50:00+00:00
Below the Cut: Technoclash (Genesis)
The RPG Consoler 2017-04-16T21:04:00+00:00
The origins of XXX as FIXME
Juho Snellman's Weblog 2017-04-17T18:00:00+00:00
CTIA v. Berkeley: affirmed
LESSIG Blog 2017-04-21T16:44:08+00:00
The luxury of a creative community
Evennia Devblog RSS Feed 2017-04-23T00:00:00+00:00
The luxury of a creative community
Griatch's Evennia musings 2017-04-23T20:46:00+00:00
Towards Simplicity & Actual Realism
Citybound Devblog 2017-04-25T00:00:00+00:00
Filthy Kitchen
dustmop.io blog 2017-04-27T17:51:41+00:00
Introducing Stagemaster
Citybound Devblog 2017-05-04T00:00:00+00:00
Game #69: Super Ninja Boy (SNES) - Dragon Ball Gaiden (Finished)
The RPG Consoler 2017-05-07T18:35:00+00:00
The Pain Of Linear Types In Rust - Faultlore
Faultlore 2017-05-08T00:00:00+00:00
A Day in a Table
Citybound Devblog 2017-05-10T00:00:00+00:00
Two Tiny Triumphs
Citybound Devblog 2017-05-13T00:00:00+00:00
Economy Sculpting
Citybound Devblog 2017-05-22T00:00:00+00:00
Let's do more together
Citybound Devblog 2017-06-04T00:00:00+00:00
Startup options v. cash
Dan Luu 2017-06-07T00:00:00+00:00
Stylized VFX in RIME
Simonschreibt. 2017-06-07T12:56:28+00:00
The widely cited studies on mouse vs. keyboard efficiency are completely bogus
Dan Luu 2017-06-13T00:00:00+00:00
Bugs You'll Probably Only Have In Rust - Faultlore
Faultlore 2017-06-14T00:00:00+00:00
The First "Patrons Calling" & How it Went
Citybound Devblog 2017-06-15T00:00:00+00:00
Interviewing My Top Patron (and a small status update)
Citybound Devblog 2017-06-20T00:00:00+00:00
Game #70: Great Greed (Game Boy) - The Road to Hell (Finished)
The RPG Consoler 2017-07-06T06:08:00+00:00
868-HACK: PLAN.B
Mighty Vision 2017-07-11T09:48:00+00:00
868-HACK: PLAN.B
Mighty Vision 2017-07-11T09:48:00+00:00
Opening Up Patrons Calling, Join the 3rd!
Citybound Devblog 2017-07-14T00:00:00+00:00
Terminal latency
Dan Luu 2017-07-18T00:00:00+00:00
Game #71: Might and Magic III: Isles of Terra (SNES) - A Glitch in Timing
The RPG Consoler 2017-07-18T04:19:00+00:00
I don't want no 'wantarray'
Juho Snellman's Weblog 2017-07-18T18:00:00+00:00
The mystery of the hanging S3 downloads
Juho Snellman's Weblog 2017-07-20T16:00:00+00:00
plan.b notes
Mighty Vision 2017-07-24T13:22:00+00:00
plan.b notes
Mighty Vision 2017-07-24T13:22:00+00:00
Game #71: Might and Magic III: Isles of Terra (SNES) - A Port for Whining
The RPG Consoler 2017-07-26T01:43:00+00:00
Welcome to pizzabox.computer
Pizza Box Computer 2017-07-31T01:01:38+00:00
Game #71: Might and Magic III: Isles of Terra (SNES) - A Note with Rhyming
The RPG Consoler 2017-08-02T03:07:00+00:00
Sneak Peak: The Open Design Doc
Citybound Devblog 2017-08-04T00:00:00+00:00
Enhanced vision, not coordination
Romain Laurent 2017-08-05T18:09:21+00:00
Game Engine Black Book ReleaseDate
Fabien Sanglard 2017-08-07T01:08:45+00:00
Game #71: Might and Magic III: Isles of Terra (SNES) - A Key Divining
The RPG Consoler 2017-08-08T02:00:00+00:00
Sattolo's algorithm
Dan Luu 2017-08-09T00:00:00+00:00
Game #71: Might and Magic III: Isles of Terra (SNES) - A Game Unwinding (Finished)
The RPG Consoler 2017-08-14T15:47:00+00:00
What Is a Workstation
Pizza Box Computer 2017-08-18T01:30:00+00:00
Why PS4 downloads are so slow
Juho Snellman's Weblog 2017-08-19T19:00:00+00:00
3 Days of Citybound
Citybound Devblog 2017-08-20T00:00:00+00:00
The Dream
Pizza Box Computer 2017-08-20T18:52:04+00:00
Branch prediction
Dan Luu 2017-08-23T00:00:00+00:00
Digital VAXstation 4000 VLC
Pizza Box Computer 2017-08-24T00:30:00+00:00
NeXTstation mono
Pizza Box Computer 2017-08-24T00:30:00+00:00
Silicon Graphics Indy
Pizza Box Computer 2017-08-24T00:30:00+00:00
Sun SPARCstation 1+
Pizza Box Computer 2017-08-24T00:30:00+00:00
The First Four Pizzaboxes
Pizza Box Computer 2017-08-24T00:30:00+00:00
Renaming Django's Auth User and App
Evennia Devblog RSS Feed 2017-08-25T00:00:00+00:00
Renaming Django's Auth User and App
Griatch's Evennia musings 2017-08-25T21:22:00+00:00
FizzleFade
Fabien Sanglard 2017-08-28T01:08:45+00:00
Current state of the game.
DeathTrash 2017-08-30T09:28:19+00:00
Indy Power Supply Replacement
Pizza Box Computer 2017-09-02T20:20:00+00:00
The Danger Of Opinions
codersnotes.com 2017-09-03T07:00:00+00:00
Numbers and tagged pointers in early Lisp implementations
Juho Snellman's Weblog 2017-09-04T15:00:00+00:00
Cubic Liters
Romain Laurent 2017-09-05T22:34:47+00:00
Game Engine Black Book Postmortem
Fabien Sanglard 2017-09-07T01:08:45+00:00
Why Command And Vector Processors Rock
codersnotes.com 2017-09-07T07:00:00+00:00
Booting the Indy
Pizza Box Computer 2017-09-10T02:20:00+00:00
I DEMAND Bassel Khartabil’s DATE OF DEATH and REMAINS...
LESSIG Blog 2017-09-15T20:30:00+00:00
HP 9000 Model 712/60
Pizza Box Computer 2017-09-19T01:00:00+00:00
Macintosh Quadra 605
Pizza Box Computer 2017-09-19T01:00:00+00:00
Macintosh Quadra 610
Pizza Box Computer 2017-09-19T01:00:00+00:00
Power Macintosh 6100/60
Pizza Box Computer 2017-09-19T01:00:00+00:00
Two Macs and an HP
Pizza Box Computer 2017-09-19T01:00:00+00:00
Evennia 0.7 released
Evennia Devblog RSS Feed 2017-09-20T00:00:00+00:00
Evennia 0.7 released
Griatch's Evennia musings 2017-09-20T20:44:00+00:00
streak scoring redux
Mighty Vision 2017-09-21T13:36:00+00:00
streak scoring redux
Mighty Vision 2017-09-21T13:36:00+00:00
Surprisingly Networking
Citybound Devblog 2017-09-25T00:00:00+00:00
Getting an Indy Desktop
Pizza Box Computer 2017-09-26T01:00:00+00:00
Cool Stuff with Textures
Simonschreibt. 2017-09-27T08:00:57+00:00
SNESPi - 3D Printed Raspberry Pi Mini SNES(s)
daftmike's blog 2017-09-28T02:51:00+00:00
Evennia in Hacktobergest 2017
Evennia Devblog RSS Feed 2017-10-01T00:00:00+00:00
Evennia in Hacktoberfest 2017
Griatch's Evennia musings 2017-10-01T20:05:00+00:00
RustFest Recap & Patron's Calling this Friday
Citybound Devblog 2017-10-02T00:00:00+00:00
Game #72: Ninja Boy 2 (Game Boy) - Ninjas vs. Pirates, In Space! (Finished)
The RPG Consoler 2017-10-06T06:20:00+00:00
Between a rock and a hard place
Romain Laurent 2017-10-07T20:14:35+00:00
The 5th Patrons Calling, Voting for the 6th & Thoughts
Citybound Devblog 2017-10-09T00:00:00+00:00
This is why I became an Engineer
Terrible Banana 2017-10-09T00:44:13+00:00
Little Lightmap Tricks
codersnotes.com 2017-10-10T07:00:00+00:00
A tool to get your copyrights back
LESSIG Blog 2017-10-12T10:15:39+00:00
Below the Cut: Dungeon Explorer II (TurboGrafx-CD)
The RPG Consoler 2017-10-15T18:14:00+00:00
Keyboard latency
Dan Luu 2017-10-16T00:00:00+00:00
6th Patrons Calling will be on Sun 22nd 5PM CEST
Citybound Devblog 2017-10-18T00:00:00+00:00
My RustFest Talk (With Networking Demo)
Citybound Devblog 2017-10-19T00:00:00+00:00
Filesystem error handling
Dan Luu 2017-10-23T00:00:00+00:00
Something Rotten In The Core
codersnotes.com 2017-10-24T07:00:00+00:00
Getting a MUD RP scene going
Evennia Devblog RSS Feed 2017-10-29T00:00:00+00:00
Replacing a dead NVRAM chip
Pizza Box Computer 2017-10-29T00:00:00+00:00
Getting a MUD Roleplaying Scene going
Griatch's Evennia musings 2017-10-29T16:01:00+00:00
October 2017 Prototype Release
Citybound Devblog 2017-11-03T00:00:00+00:00
HOME :: Pins & Patches :: LAPEL PINS :: Dick Banana
Terrible Banana 2017-11-04T00:56:15+00:00
Just fuck right off
Terrible Banana 2017-11-06T16:32:15+00:00
UI backwards compatibility
Dan Luu 2017-11-09T00:00:00+00:00
How out of date are Android devices?
Dan Luu 2017-11-12T00:00:00+00:00
How good are decisions? Evaluating decision quality in domains where evaluation is easy
Dan Luu 2017-11-21T00:00:00+00:00
Booting the SPARCstation
Pizza Box Computer 2017-11-21T21:00:00+00:00
Data General AViiON AV/300D
Pizza Box Computer 2017-11-21T21:00:00+00:00
DEC 3000 300X
Pizza Box Computer 2017-11-21T21:00:00+00:00
Digital AlphaStation 200 4/233
Pizza Box Computer 2017-11-21T21:00:00+00:00
Digital Multia
Pizza Box Computer 2017-11-21T21:00:00+00:00
IBM RS/6000 POWERstation Model 250
Pizza Box Computer 2017-11-21T21:00:00+00:00
Three new RISC boxes
Pizza Box Computer 2017-11-21T21:00:00+00:00
November 2017 Prototype Release
Citybound Devblog 2017-12-06T00:00:00+00:00
Newsletter #8 - Open up the Windows
Neovim 2017-12-16T00:00:00+00:00
Computer latency: 1977-2017
Dan Luu 2017-12-24T00:00:00+00:00
untitled
Mighty Vision 2017-12-24T18:17:00+00:00
untitled
Mighty Vision 2017-12-24T18:17:00+00:00
Cinco Paus - dev notes
Mighty Vision 2017-12-27T15:13:00+00:00
Cinco Paus - dev notes
Mighty Vision 2017-12-27T15:13:00+00:00
New year, new stuff
Evennia Devblog RSS Feed 2018-01-05T00:00:00+00:00
New year, new stuff
Griatch's Evennia musings 2018-01-05T10:29:00+00:00
In Search Of The Lost Program
codersnotes.com 2018-01-11T08:00:00+00:00
Self awareness
Romain Laurent 2018-01-15T12:40:13+00:00
untitled
Mighty Vision 2018-01-17T18:19:00+00:00
untitled
Mighty Vision 2018-01-17T18:19:00+00:00
Metasploit+Amazon SES, or debugging Sendmail’s SMTP Authentication
The Grymoire 2018-01-17T19:29:32+00:00
Bus Pirate Cables – which is the best?
The Grymoire 2018-01-18T14:38:44+00:00
Game #73: Dungeon Master (SNES) - Dungeon Meat of Doom!
The RPG Consoler 2018-01-22T03:10:00+00:00
Kicking into gear from a distance
Evennia Devblog RSS Feed 2018-01-27T00:00:00+00:00
Kicking into gear from a distance
Griatch's Evennia musings 2018-01-27T22:27:00+00:00
How I spent December and January
Citybound Devblog 2018-02-04T00:00:00+00:00
7th Patrons Calling will be on Sun 11th 5PM CET
Citybound Devblog 2018-02-09T00:00:00+00:00
A PowerMac Surprise!
Pizza Box Computer 2018-02-14T02:00:00+00:00
Unifying Road & Zoning UI
Citybound Devblog 2018-03-11T00:00:00+00:00
scabulous
The Dictionary of Obscure Sorrows 2018-03-12T13:06:00+00:00
Altschmerz
The Dictionary of Obscure Sorrows 2018-03-12T13:19:30+00:00
onism
The Dictionary of Obscure Sorrows 2018-03-12T13:21:56+00:00
March '18 Hackathon
Citybound Devblog 2018-03-17T00:00:00+00:00
Booting the Multia
Pizza Box Computer 2018-03-17T21:00:00+00:00
Conclusions about the Hackathon & Livestreaming
Citybound Devblog 2018-03-27T00:00:00+00:00
Fsyncgate: errors on fsync are unrecovarable
Dan Luu 2018-03-28T00:00:00+00:00
Steel Survivor: an IBM XT Tale
int10h.org - VileR's blog 2018-03-29T19:58:37+00:00
LED Matrix Animation Frame
daftmike's blog 2018-03-31T15:53:00+00:00
Phlogiston preview
Mighty Vision 2018-04-04T12:21:00+00:00
Phlogiston preview
Mighty Vision 2018-04-04T12:21:00+00:00
10 Minute Mod: GameBoy Screen Rot Fix(?)
daftmike's blog 2018-04-08T22:17:00+00:00
A Dusting of Gamification
Joel on Software 2018-04-13T13:40:21+00:00
What the Zoning Prototype Will Bring
Citybound Devblog 2018-04-22T00:00:00+00:00
Strange and maddening rules
Joel on Software 2018-04-23T14:42:45+00:00
Small update
DeathTrash 2018-04-23T17:23:08+00:00
The Giffinator – Technical Details
dustmop.io blog 2018-04-26T18:59:53+00:00
Booting the HP 712
Pizza Box Computer 2018-04-28T22:00:00+00:00
Stylized VFX in RIME – Water Edition
Simonschreibt. 2018-05-01T11:18:30+00:00
Announcing Stack Overflow for Teams
Joel on Software 2018-05-03T12:58:25+00:00
Cleaning the VAXstation
Pizza Box Computer 2018-05-04T02:40:00+00:00
So Long, Blogspot
int10h.org - VileR's blog 2018-05-05T13:45:28+00:00
Death Trash
DeathTrash 2018-05-05T15:02:54+00:00
Zelda – The Bling-Bling Offset
Simonschreibt. 2018-05-05T20:19:35+00:00
pâron. the feeling that no matter what you do is always somehow...
The Dictionary of Obscure Sorrows 2018-05-15T12:22:03+00:00
⋇⋇
Romain Laurent 2018-05-17T01:36:01+00:00
Fun with Macros: Gathering
Steve Losh 2018-05-21T16:05:00+00:00
midding
The Dictionary of Obscure Sorrows 2018-05-22T08:48:29+00:00
Flexi IBM VGA Font: a Scalable Take on Text Mode
int10h.org - VileR's blog 2018-05-22T21:01:21+00:00
imbroglio notes 13 - phlogiston
Mighty Vision 2018-05-24T15:55:00+00:00
imbroglio notes 13 - phlogiston
Mighty Vision 2018-05-24T15:55:00+00:00
Taking Decent Photos of your CRT TV Screen
int10h.org - VileR's blog 2018-06-16T20:37:28+00:00
Game #73: Dungeon Master (SNES) - All Things in Moderation, Including Moderation (Finished)
The RPG Consoler 2018-06-19T02:49:00+00:00
3D-Printed (Baby) Drum Pedal
daftmike's blog 2018-07-02T19:02:00+00:00
A yak shave with SGI's EFS
Pizza Box Computer 2018-07-04T19:00:00+00:00
Digital DECstation 5000/200
Pizza Box Computer 2018-07-07T18:00:00+00:00
Filling in more cracks
Pizza Box Computer 2018-07-07T18:00:00+00:00
HP 9000 Model 425e
Pizza Box Computer 2018-07-07T18:00:00+00:00
Sun SPARCstation 20
Pizza Box Computer 2018-07-07T18:00:00+00:00
Sun SunBlade 150
Pizza Box Computer 2018-07-07T18:00:00+00:00
Fun with Macros: If-Let and When-Let
Steve Losh 2018-07-09T16:00:00+00:00
Game #74: Sorcerer's Kingdom (Genesis) - Understandably Forgotten (Finished)
The RPG Consoler 2018-07-11T03:26:00+00:00
Photo
Terrible Banana 2018-07-12T04:07:50+00:00
Hey “anonymous” (where is trademark law when you need it?), f*ck you.
LESSIG Blog 2018-07-12T12:27:47+00:00
Optimizing a breadth-first search
Juho Snellman's Weblog 2018-07-23T16:00:00+00:00
(⊙_◎)
Romain Laurent 2018-07-30T06:41:49+00:00
Portable LumipenLatest project from Ishikawa Senoo Laboratory...
prosthetic knowledge 2018-07-31T22:09:54+00:00
APPARATUMMusical interactive interface from pangenerator is...
prosthetic knowledge 2018-08-01T22:41:40+00:00
BRUTEGerman wine maker whose branding (put together by Patrik...
prosthetic knowledge 2018-08-03T16:14:41+00:00
The Reeplicator AIProject by Rama Allen of The Mill creates a...
prosthetic knowledge 2018-08-05T20:29:37+00:00
Ultra MaryProject by Anastasia Alekhina is a collection of LED...
prosthetic knowledge 2018-08-05T21:16:01+00:00
Sunday evening mood
Romain Laurent 2018-08-06T01:30:34+00:00
Neural BeatboxCoding project from Nao Tokui uses neural...
prosthetic knowledge 2018-08-07T15:41:03+00:00
The Alternative Late Show with Stephen ColbertShort video...
prosthetic knowledge 2018-08-07T21:13:59+00:00
TVCGAFIX Utilities - Adjust CGA Output for TV
int10h.org - VileR's blog 2018-08-07T22:51:28+00:00
After the storm
Romain Laurent 2018-08-08T18:47:24+00:00
The BarcodersMusical project featuring an ensemble including Ei...
prosthetic knowledge 2018-08-08T22:40:58+00:00
Fast Pix2PixProject from Zaid Alyafeai presents a faster...
prosthetic knowledge 2018-08-13T16:03:32+00:00
Text to ImageLatest web-based project from Cristóbal...
prosthetic knowledge 2018-08-15T21:32:03+00:00
prostheticknowledge: Fast Pix2Pix Project from Zaid Alyafeai...
prosthetic knowledge 2018-08-17T16:45:05+00:00
Inline building in upcoming Evennia 0.8
Evennia Devblog RSS Feed 2018-08-18T00:00:00+00:00
Inline building in upcoming Evennia 0.8
Griatch's Evennia musings 2018-08-18T12:23:00+00:00
Dynamic density shaping of photokinetic E. coliResearch from...
prosthetic knowledge 2018-08-18T13:53:43+00:00
Recycle-GANGraphics research from Carnegie Mellon’s School of...
prosthetic knowledge 2018-08-18T20:52:30+00:00
InventoryProject from Oddviz is a collection of images composed...
prosthetic knowledge 2018-08-20T14:06:24+00:00
Video-to-Video SynthesisAmazing graphics research from @nvidia...
prosthetic knowledge 2018-08-20T15:41:27+00:00
don’t do it mancall the national suicide prevention hotline at...
Terrible Banana 2018-08-22T04:40:11+00:00
Everybody Dance NowGraphics research from UC Berkeley is the...
prosthetic knowledge 2018-08-24T12:45:50+00:00
A Road to Common Lisp
Steve Losh 2018-08-27T15:50:00+00:00
What The Hell Was The Microsoft Network?
codersnotes.com 2018-08-29T07:00:00+00:00
1: Michael Drogalis on Pyrostore's Acquisition, the future of Onyx, and stream processing
The REPL 2018-08-29T08:38:56+00:00
AIRBNB HOSTSOnline game by Dries Depoorter and David...
prosthetic knowledge 2018-08-29T17:21:20+00:00
Fast Pix2Pix - UpdateInteresting additions to project by Zaid...
prosthetic knowledge 2018-08-30T14:20:05+00:00
Photo
.mattfraction 2018-09-01T00:00:27+00:00
photos-of-space:Active Prominences on a Quiet Sun (Photo: Alan...
.mattfraction 2018-09-02T00:00:36+00:00
Time to call it a day ...
prosthetic knowledge 2018-09-05T21:43:49+00:00
The Architecture of the Medieval Page
medievalbooks 2018-09-07T16:36:36+00:00
2: Daniel Higginbotham on Specmonstah, Clojure Spec, and Ent walking trees
The REPL 2018-09-10T09:45:06+00:00
casualmenofaction: mattfractionblog: In the last two shoots...
.mattfraction 2018-09-10T16:31:39+00:00
kierongillen: die-comic: For more information go read the first...
.mattfraction 2018-09-10T23:33:53+00:00
Apple's topsy-turvy iPhone lineup
Article on Coyote Cartography 2018-09-13T18:20:57+00:00
3: Mike Fikes on ClojureScript type inference, Graal, and Clojurists Together
The REPL 2018-09-16T12:29:45+00:00
speakingparts: Les garçons sauvagesBertrand Mandico 2017
.mattfraction 2018-09-17T00:00:32+00:00
untitled
.mattfraction 2018-09-17T01:49:06+00:00
Me, Myself, and I: The Story of Two Medieval Selfies
medievalbooks 2018-09-20T17:52:09+00:00
Introducing Live Builds
Citybound Devblog 2018-09-21T00:00:00+00:00
abandonedandurbex: Stairwell in an abandoned button factory
.mattfraction 2018-09-23T00:00:39+00:00
Bloated
Fabien Sanglard 2018-09-23T01:08:45+00:00
oldshowbiz: Don’t let anybody tell you that there were no women...
.mattfraction 2018-09-24T00:00:23+00:00
nevver: David Shrigley
.mattfraction 2018-09-25T00:00:26+00:00
4: Bruce Hauman on interactive development, Figwheel, and Rebel Readline
The REPL 2018-09-25T20:39:22+00:00
abandonedandurbex: 132 year old rifle that was found leaning up...
.mattfraction 2018-09-26T00:00:36+00:00
Photo
.mattfraction 2018-09-27T00:00:37+00:00
bushdog: (1958 Danelectro U3 - Thunder Road Guitars Seattleから)
.mattfraction 2018-09-28T00:00:25+00:00
seanhowe: Advertisement for the Psychedelicatessen, 164 Avenue...
.mattfraction 2018-09-29T00:00:43+00:00
HELLO COMBLIUMIBUS HELLO CXC @skellyskellyskelly x me DROP HOT...
.mattfraction 2018-09-29T14:21:01+00:00
Evennia 0.8 released
Evennia Devblog RSS Feed 2018-09-30T00:00:00+00:00
biomorphosis:When you flip bats upside down they become...
.mattfraction 2018-09-30T00:00:35+00:00
sorry errbody @ cxc. i spent the previous evening vomiting more...
.mattfraction 2018-09-30T11:49:30+00:00
Evennia 0.8 released
Griatch's Evennia musings 2018-09-30T19:35:00+00:00
oldshowbiz: The Burt Reynolds Late Show
.mattfraction 2018-10-01T00:00:15+00:00
5: Looking At The Web After Tomorrow with Nikita Prokopov
The REPL 2018-10-03T22:15:03+00:00
Evennia in Hacktoberfest 2018
Griatch's Evennia musings 2018-10-04T08:34:00+00:00
Doodles in Medieval Manuscripts
medievalbooks 2018-10-05T18:46:26+00:00
Notes on Type Layouts and ABIs in Rust - Faultlore
Faultlore 2018-10-09T00:00:00+00:00
6: Thomas Heller on Shadow CLJS
The REPL 2018-10-10T03:49:03+00:00
wildragon:Probably Flinthook’s Bounty Battle storyline in a...
Tribute Games 2018-10-17T20:00:17+00:00
7: Ben Brinckerhoff on Clojure Spec and Error Messages
The REPL 2018-10-18T01:09:54+00:00
What does Stack Overflow want to be when it grows up?
Coding Horror 2018-10-22T10:52:32+00:00
8: Elana Hashman on Debian and Clojure
The REPL 2018-10-24T21:21:37+00:00
9: Hannah Henderson on Continuous Integration at CircleCI
The REPL 2018-11-02T03:31:00+00:00
Medieval Book Carousels
medievalbooks 2018-11-02T16:54:17+00:00
10: Howard Lewis Ship on GraphQL and Lacinia
The REPL 2018-11-12T09:32:59+00:00
The Kinds of Implementation-Defined? - Faultlore
Faultlore 2018-11-13T00:00:00+00:00
glamoramamama75:
.mattfraction 2018-11-13T09:33:46+00:00
Low cognitive load blogging
Article on Coyote Cartography 2018-11-15T00:40:53+00:00
The new iPad Pro
Article on Coyote Cartography 2018-11-16T19:28:03+00:00
11: Saskia Lindner on re-frame-10x, compassionate coding, and mindfulness
The REPL 2018-11-25T19:00:00+00:00
12: Clojure documentation with Martin Klepsch
The REPL 2018-11-27T19:00:00+00:00
The Cluster Clan is a colorful bunch of swashbucklers! See...
Tribute Games 2018-11-28T19:06:49+00:00
Photo
Terrible Banana 2018-11-29T21:23:12+00:00
The 12 Days of Tribute Giveaway – How To WinStarting December...
Tribute Games 2018-12-01T17:00:38+00:00
Installing A/UX on the Quadra 610
Pizza Box Computer 2018-12-02T21:45:00+00:00
13: High performance Clojure numerics with Chris Nuernberger
The REPL 2018-12-04T19:00:00+00:00
Let's talk about the Tumblrpocalypse
Article on Coyote Cartography 2018-12-04T20:57:52+00:00
14: ClojureScript, Lumo, and Lambdas with Antonio Monteiro
The REPL 2018-12-05T02:23:28+00:00
Game Engine Black Book: Wolfenstein 3D, 2nd Edition
Fabien Sanglard 2018-12-06T00:00:00+00:00
tributegames: The 12 Days of Tribute Giveaway – How To...
Tribute Games 2018-12-06T16:13:02+00:00
Vimways: From .vimrc to .vim
Arabesque 2018-12-08T08:50:17+00:00
Game Engine Black Book: DOOM
Fabien Sanglard 2018-12-10T00:00:00+00:00
FUCK YOU TUMBLR NO PANTS ON NO FUCKS GIVEN YOU DONT BAN ME I BAN YOU EAT MY WHOLE COOKIE ASS…
.mattfraction 2018-12-10T12:15:26+00:00
Vimways: Runtime hackery
Arabesque 2018-12-10T21:27:24+00:00
How the Dreamcast copy protection was defeated
Fabien Sanglard 2018-12-11T00:00:00+00:00
🔴 NOW LIVE! ➡️ www.twitch.tv/tributegames12 Days of Tribute...
Tribute Games 2018-12-13T21:00:26+00:00
Ninja Senki Now Available!
Tribute Games 2018-12-20T15:00:24+00:00
Get up to 80% OFF Flinthook, Curses ‘N Chaos, Mercenary Kings:...
Tribute Games 2018-12-20T20:00:29+00:00
Deciphering the postcard sized raytracer
Fabien Sanglard 2018-12-24T00:00:00+00:00
How DOOM fire was made
Fabien Sanglard 2018-12-28T00:00:00+00:00
Into 2019
Evennia Devblog RSS Feed 2019-01-02T00:00:00+00:00
Into 2019!
Griatch's Evennia musings 2019-01-02T19:22:00+00:00
River 2k19 Edition
Simonschreibt. 2019-01-07T08:51:02+00:00
Toggle Redshift with Keyboard Shortcut
Winny's Blog 2019-01-09T13:04:00+00:00
15: Clojure at Apple with David Taylor
The REPL 2019-01-11T19:14:56+00:00
Publishing with org-static-blog
Winny's Blog 2019-01-11T21:00:00+00:00
GNU C Style
Winny's Blog 2019-01-13T06:00:00+00:00
Compartmentalization
Romain Laurent 2019-01-14T19:13:24+00:00
quiet blog year
Mighty Vision 2019-01-17T00:20:00+00:00
quiet blog year
Mighty Vision 2019-01-17T00:20:00+00:00
16: Monorepos and monologues with Alex Engelberg
The REPL 2019-01-18T18:46:47+00:00
A bit of a stretch
Romain Laurent 2019-01-18T23:26:23+00:00
Blink Shell: First Thoughts
Winny's Blog 2019-01-23T03:35:00+00:00
The Oldest Surviving Printed Advertisement in English (London, 1477)
medievalbooks 2019-01-24T18:27:27+00:00
Emotional Flow
Romain Laurent 2019-01-24T20:05:51+00:00
Fighting For A Miracle: Venezuela’s War Between Past, Present, and Future
Not My Empire 2019-01-27T03:07:57+00:00
17: Editing Clojure code with Shaun Lebron
The REPL 2019-02-04T06:30:00+00:00
insidematthieu:A fanimation of Leo and Lea from Curses and...
Tribute Games 2019-02-05T15:53:19+00:00
A digression about Facebook
Article on Coyote Cartography 2019-02-05T16:54:43+00:00
untitled
Article on Coyote Cartography 2019-02-05T21:48:10+00:00
untitled
Article on Coyote Cartography 2019-02-13T15:47:45+00:00
📢 IMPORTANT NEWS 📢We are excited to announce that we will have...
Tribute Games 2019-02-13T21:25:30+00:00
The Cloud Is Just Someone Else’s Computer
Coding Horror 2019-02-17T02:15:26+00:00
Randomized trial on gender in Overwatch
Dan Luu 2019-02-19T00:00:00+00:00
18: Testing Clojure and ClojureScript with Arne Brasseur
The REPL 2019-02-20T03:54:45+00:00
Ilhan Omar Is Fighting The Real Anti-Semites
Not My Empire 2019-03-09T15:47:14+00:00
19: Formatting Clojure code with Shaun Lebron
The REPL 2019-03-12T18:00:00+00:00
New Upcoming Game: Panzer Paladin
Tribute Games 2019-03-13T15:00:26+00:00
Panzer Paladin Announced by Tribute Games
Tribute Games 2019-03-14T16:06:36+00:00
Google Summer of Code 2019
Neovim 2019-03-17T00:00:00+00:00
20: Clojure MXNet with Carin Meier
The REPL 2019-03-19T18:00:00+00:00
Why Hashbrown Does A Double-Lookup - Faultlore
Faultlore 2019-03-20T00:00:00+00:00
Irony and the Alt-Right: What the Christchurch Shooter Tells Us About Belief
Not My Empire 2019-03-21T23:58:31+00:00
Ten (More) Brief Thoughts On Russiagate
Not My Empire 2019-03-26T00:46:31+00:00
21: Looking at Clojure through the mindset of business with Jonathan Boston
The REPL 2019-03-26T18:00:00+00:00
The next CEO of Stack Overflow
Joel on Software 2019-03-28T14:00:53+00:00
The story of the Rendition Vérité 1000
Fabien Sanglard 2019-04-01T00:00:00+00:00
22: Cursive IDE with Colin Fleming
The REPL 2019-04-02T18:19:46+00:00
The story of the 3dfx Voodoo 1
Fabien Sanglard 2019-04-04T00:00:00+00:00
pixelartus: Panzer Paladin, is an upcoming action platformer in...
Tribute Games 2019-04-04T21:07:37+00:00
Medium thinks it's a brand
Article on Coyote Cartography 2019-04-13T16:40:36+00:00
23: Elements of Clojure with Zach Tellman
The REPL 2019-04-18T23:06:18+00:00
Steaming on Eating Jam
Evennia Devblog RSS Feed 2019-04-25T00:00:00+00:00
Steaming on, eating jam
Griatch's Evennia musings 2019-04-25T09:42:00+00:00
Podcast about Evennia
Evennia Devblog RSS Feed 2019-05-09T00:00:00+00:00
Podcast about Evennia
Griatch's Evennia musings 2019-05-09T13:38:00+00:00
Celebrating 8 years of Tribute!
Tribute Games 2019-05-09T15:00:30+00:00
Writing a procedural puzzle generator
Juho Snellman's Weblog 2019-05-14T15:00:00+00:00
De-uglifying 40-Column Text Games for VGA
int10h.org - VileR's blog 2019-05-15T22:16:53+00:00
Game Engine Black Book update
Fabien Sanglard 2019-05-17T00:00:00+00:00
Creating Evscaperoom Part 1
Evennia Devblog RSS Feed 2019-05-18T00:00:00+00:00
Random (but not angry) thoughts on "Game of Thrones"
Article on Coyote Cartography 2019-05-18T02:58:26+00:00
Creating Evscaperoom, part 1
Griatch's Evennia musings 2019-05-18T18:50:00+00:00
Here's My Type, So Initialize Me Maybe - Faultlore
Faultlore 2019-05-21T00:00:00+00:00
Fontraption (a VGA Text Mode Font Editor)
int10h.org - VileR's blog 2019-05-22T20:56:31+00:00
Creating Evscaperoom Part 2
Evennia Devblog RSS Feed 2019-05-26T00:00:00+00:00
Creating Evscaperoom, part 2
Griatch's Evennia musings 2019-05-26T09:03:00+00:00
An Exercise Program for the Fat Web
Coding Horror 2019-05-30T11:04:52+00:00
Citybound as a Truly Moddable and Educational Simulation
Citybound Devblog 2019-06-02T00:00:00+00:00
24: Crux, a new bitemporal database from JUXT
The REPL 2019-06-12T06:27:11+00:00
Installing pyftdi on Ubuntu 18.04 for FT232H and FT2232H boards
The Grymoire 2019-06-12T14:11:45+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-20T06:39:20+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-20T06:48:20+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-20T07:15:32+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-20T08:10:02+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-20T08:18:13+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-20T08:33:10+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-20T08:40:08+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-20T08:40:22+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-20T08:40:32+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-06-21T05:32:22+00:00
25: Dragan Djuric on Neanderthal
The REPL 2019-06-26T07:00:00+00:00
Evennia 0.9 released
Evennia Devblog RSS Feed 2019-07-04T00:00:00+00:00
Evennia 0.9 released
Griatch's Evennia musings 2019-07-04T17:45:00+00:00
26: Nathan Marz on a new programming paradigm
The REPL 2019-07-10T07:00:00+00:00
Files are fraught with peril
Dan Luu 2019-07-12T00:00:00+00:00
Review: Brydge 12.9″ Keyboard Pro
Article on Coyote Cartography 2019-07-12T14:43:43+00:00
P1 SELECT
Mighty Vision 2019-07-13T17:22:00+00:00
P1 SELECT
Mighty Vision 2019-07-13T17:22:00+00:00
Photo
Terrible Banana 2019-07-14T17:11:12+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-07-16T09:17:28+00:00
For every one drawing I scan, there’s at least 10 that I...
HIGHLIGHTER AND SHARPIE PARTY 2019-07-19T08:20:10+00:00
27: Eric Normand on teaching Clojure
The REPL 2019-07-24T07:00:00+00:00
Swisstable, a Quick and Dirty Description - Faultlore
Faultlore 2019-07-27T00:00:00+00:00
Open URL in existing Qutebrowser from Emacs Daemon on Gentoo
Winny's Blog 2019-07-28T05:00:00+00:00
The Danger of fuzzy matching over one's PATH
Winny's Blog 2019-08-02T11:00:00+00:00
👁
Romain Laurent 2019-08-06T20:32:37+00:00
28: Ambrose Bonnaire-Sergeant on Typed Clojure
The REPL 2019-08-12T05:28:41+00:00
The iPad needs more focus on the little things
Article on Coyote Cartography 2019-08-12T15:36:07+00:00
Happy Birthday Scott Pilgrim vs. The World!08/13/2010 What if...
Tribute Games 2019-08-13T16:45:31+00:00
Electric Geek Transportation Systems
Coding Horror 2019-08-20T11:35:16+00:00
Making and Tool-Making
Citybound Devblog 2019-08-24T00:00:00+00:00
Tool Making Follow-Up: What I Mean by Friction
Citybound Devblog 2019-08-25T00:00:00+00:00
29: Marc O'Morain on adding Windows support to CircleCI
The REPL 2019-08-26T17:00:00+00:00
What Remains Technical Breakdown
dustmop.io blog 2019-09-10T17:42:32+00:00
lilo
The Dictionary of Obscure Sorrows 2019-09-11T20:16:05+00:00
The Rise of the Electric Scooter
Coding Horror 2019-09-12T07:24:32+00:00
Trailer
Verb Your Enthusiasm 2019-09-23T00:49:51+00:00
Welcome, Prashanth!
Joel on Software 2019-09-24T14:00:17+00:00
Death Trash will enter Steam Early Access soonWe have some...
DeathTrash 2019-09-25T06:15:35+00:00
Text Rendering Hates You - Faultlore
Faultlore 2019-09-28T00:00:00+00:00
Today, Impeachment; Tomorrow, Riot!
Not My Empire 2019-09-28T21:12:20+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-09-29T09:47:47+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-09-29T09:48:47+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-09-29T09:49:35+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-09-29T09:50:11+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-09-29T09:50:40+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-09-29T09:58:38+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-09-29T10:20:38+00:00
Photo
HIGHLIGHTER AND SHARPIE PARTY 2019-09-29T10:44:09+00:00
Blackifying and fixing bugs
Evennia Devblog RSS Feed 2019-09-30T00:00:00+00:00
Blackifying and fixing bugs
Griatch's Evennia musings 2019-09-30T15:39:00+00:00
Globalisms, Real And Imagined: Hong Kong, Haiti, And The New Internationals
Not My Empire 2019-10-06T02:26:11+00:00
Photo
Terrible Banana 2019-10-06T14:47:05+00:00
2019 Episode 1
Verb Your Enthusiasm 2019-10-10T19:51:18+00:00
Big Mouth, Little Fascisms
Not My Empire 2019-10-11T22:05:11+00:00
2019 Episode 2
Verb Your Enthusiasm 2019-10-20T04:04:21+00:00
Trump Is No Isolationist, He Just Hates Democracy
Not My Empire 2019-10-22T01:16:20+00:00
30: Bobby Calderwood on Kafka and Fintech
The REPL 2019-10-22T01:36:33+00:00
Release 2.5.2
The Ground Gives Way 2019-10-26T15:19:39+00:00
A trip down NBA Jam graphics pipeline
Fabien Sanglard 2019-10-28T00:00:00+00:00
The Facebook Crisis Is Bigger Than Fact-Checking
Not My Empire 2019-10-28T23:02:34+00:00
Track down basic Emacs bugs & hangs
Winny's Blog 2019-11-01T05:00:00+00:00
Breaking Bad: The Incomplete History of the St Albans Bible
medievalbooks 2019-11-01T16:47:30+00:00
On CTIA v. City of Berkeley
LESSIG Blog 2019-11-02T12:17:35+00:00
GDG Milwaukee 2019 DevFest - We participated!
Winny's Blog 2019-11-06T07:39:00+00:00
How Swift Achieved Dynamic Linking Where Rust Couldn't - Faultlore
Faultlore 2019-11-07T00:00:00+00:00
2019 Episode 3
Verb Your Enthusiasm 2019-11-08T02:19:46+00:00
31: Joel Holdbrooks on Meander
The REPL 2019-11-08T03:30:04+00:00
2019 Episode 4
Verb Your Enthusiasm 2019-11-11T15:49:27+00:00
32: Clojure, Kafka, and OPERATR with Derek Troy-West
The REPL 2019-11-13T06:00:00+00:00
Won MSOE x Google Cloud Hackathon
Winny's Blog 2019-11-19T01:49:00+00:00
current projects
Mighty Vision 2019-11-22T21:41:00+00:00
current projects
Mighty Vision 2019-11-22T21:41:00+00:00
33: Peter Strömberg on Calva, a Clojure plugin for VS Code
The REPL 2019-11-23T01:44:06+00:00
Milwaukee Code Camp
Winny's Blog 2019-11-24T01:24:00+00:00
On Making a Pizza Delivery Game I recently finished up working on a prototype of a small pizza...
jordan orelli 2019-11-29T15:43:14+00:00
This blog would have been 10 years old today (it’s still retired).10 years covering the subject of...
prosthetic knowledge 2019-12-01T21:18:05+00:00
Strike Commander: Interview with Frank Savage
Fabien Sanglard 2019-12-03T00:00:00+00:00
So, how’s that retirement thing going, anyway?
Joel on Software 2019-12-05T22:51:39+00:00
Using bash to monitor devices entering/exiting a LAN
The Grymoire 2019-12-09T16:41:16+00:00
Reverse Engineering the DirecTV App’s DVR Authentication
Neglected Potential 2019-12-19T19:45:14+00:00
How I'm Implementing Procedural Architecture
Citybound Devblog 2019-12-21T00:00:00+00:00
The Research That Goes Into Citybound
Citybound Devblog 2019-12-22T00:00:00+00:00
Why I'm moving from Patreon to Github Sponsors
Citybound Devblog 2019-12-23T00:00:00+00:00
How to fix early framebuffer problems, or "Can I type my disk password yet??"
Winny's Blog 2019-12-25T08:37:00+00:00
Apparently a new line of attack against Medicare for All is that the Medicare reimbursement rate is…
Squashed 2019-12-26T20:57:17+00:00
small progress update
Mighty Vision 2019-12-28T18:04:00+00:00
small progress update
Mighty Vision 2019-12-28T18:04:00+00:00
The Polygons of Another World
Fabien Sanglard 2020-01-01T00:00:00+00:00
The Polygons of Another World: Amiga
Fabien Sanglard 2020-01-02T00:00:00+00:00
The Polygons of Another World: Atari ST
Fabien Sanglard 2020-01-03T00:00:00+00:00
The Polygons of Another World: PC DOC
Fabien Sanglard 2020-01-04T00:00:00+00:00
Algorithms interviews: theory vs. practice
Dan Luu 2020-01-05T00:00:00+00:00
The Polygons of Another World: Genesis
Fabien Sanglard 2020-01-05T00:00:00+00:00
Photo
Terrible Banana 2020-01-05T21:23:45+00:00
Photo
Terrible Banana 2020-01-06T21:52:10+00:00
There Is No Case For War With Iran
Not My Empire 2020-01-07T04:12:07+00:00
Switching website to GitLab Pages
Winny's Blog 2020-01-07T19:16:00+00:00
The Polygons of Another World: SNES
Fabien Sanglard 2020-01-19T00:00:00+00:00
this guy sucks
Terrible Banana 2020-01-20T23:46:34+00:00
NeXTstep on the HP 712 Part 1: Installation
Pizza Box Computer 2020-01-21T01:20:00+00:00
The Polygons of Another World: GBA
Fabien Sanglard 2020-01-26T00:00:00+00:00
95%-ile isn't that good
Dan Luu 2020-02-07T00:00:00+00:00
Photo
Terrible Banana 2020-02-08T04:07:21+00:00
Suspicious discontinuities
Dan Luu 2020-02-18T00:00:00+00:00
A Dozen Small Games
jordan orelli 2020-02-23T18:25:04+00:00
PANZER PALADIN Coming Soon to Nintendo Switch and Steam! Click...
Tribute Games 2020-02-25T16:01:23+00:00
Photo
garfield minus garfield 2020-03-01T14:24:59+00:00
On the eve of Super Tuesday...
Squashed 2020-03-02T22:51:45+00:00
The growth of command line options, 1979-Present
Dan Luu 2020-03-03T00:00:00+00:00
Why All of Bernie's Supporters should vote for Warren Instead Because of Math
Squashed 2020-03-03T01:34:14+00:00
The beautiful machine
Fabien Sanglard 2020-03-06T00:00:00+00:00
How (some) good corporate engineering blogs are written
Dan Luu 2020-03-11T00:00:00+00:00
The Polygons of Another World: Jaguar
Fabien Sanglard 2020-03-13T00:00:00+00:00
GTA V – The Wormy Fountain
Simonschreibt. 2020-03-20T22:10:01+00:00
34: CIDER and tending the Orchard with Bozhidar Batsov
The REPL 2020-03-24T19:00:00+00:00
The Polygons of DOOM: PSX
Fabien Sanglard 2020-03-26T00:00:00+00:00
Ideas for Upcoming Livestreams (Pedestrians & Epidemics)
Citybound Devblog 2020-03-28T00:00:00+00:00
Extending a wireless LAN with a bridged Ethernet LAN using Mikrotik RouterOS
Winny's Blog 2020-03-29T17:23:00+00:00
35: Mature Clojure codebases with Łukasz Korecki
The REPL 2020-04-01T23:17:03+00:00
Crafting “Crafting Interpreters”
journal.stuffwithstuff.com 2020-04-05T07:00:00+00:00
Spring updates while trying to stay healthy
Evennia Devblog RSS Feed 2020-04-14T00:00:00+00:00
Newsletter #9 - Three's company
Neovim 2020-04-14T00:00:00+00:00
Spring updates while trying to stay healthy
Griatch's Evennia musings 2020-04-14T16:31:00+00:00
Auto-Injecting Files into an Active PCem/86Box Machine
int10h.org - VileR's blog 2020-04-15T09:19:00+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2020-04-17T13:14:23+00:00
A week in the life of Winston
Winny's Blog 2020-04-19T00:27:00+00:00
Building a PC, Part IX: Downsizing
Coding Horror 2020-04-19T23:56:03+00:00
The Making Of Stunt Island
Fabien Sanglard 2020-04-21T00:00:00+00:00
Debugging Zathura, GTK (don't forget about seccomp)
Winny's Blog 2020-04-25T03:49:00+00:00
36: Clojure CLI tools with Michiel Borkent
The REPL 2020-04-25T21:15:43+00:00
Revisiting the Businesscard Raytracer
Fabien Sanglard 2020-05-01T00:00:00+00:00
Linux dmesg –follow (-w) not working?
Winny's Blog 2020-05-01T02:04:00+00:00
An history of NVidia Stream Multiprocessor
Fabien Sanglard 2020-05-02T00:00:00+00:00
preview for Imbroglio: Mizzenmast
Mighty Vision 2020-05-02T15:37:00+00:00
preview for Imbroglio: Mizzenmast
Mighty Vision 2020-05-02T15:37:00+00:00
imbroglio - expansion & crash
Mighty Vision 2020-05-05T16:58:00+00:00
imbroglio - expansion & crash
Mighty Vision 2020-05-05T16:58:00+00:00
Memories of Working on Homestuck - Faultlore
Faultlore 2020-05-06T00:00:00+00:00
0x10 rules
Fabien Sanglard 2020-05-07T00:00:00+00:00
definite plan
Mighty Vision 2020-05-10T11:35:00+00:00
definite plan
Mighty Vision 2020-05-10T11:35:00+00:00
animenostalgia:Gunnm (aka Battle Angel Alita) by Yukito Kishiro
ONO-SENDAI CYBERSPACE 7 2020-05-10T16:10:58+00:00
pinkbubblegum3:Katsuya Terada ♥
ONO-SENDAI CYBERSPACE 7 2020-05-10T16:11:12+00:00
thevideogameartarchive: Artwork from ‘ESWAT’ on the Sega...
ONO-SENDAI CYBERSPACE 7 2020-05-10T16:11:37+00:00
animarchive:Japanese artist and sculptor Kow Yokoyama -...
ONO-SENDAI CYBERSPACE 7 2020-05-10T16:11:45+00:00
curatorofthisdigitalmorass: SYD MEAD
ONO-SENDAI CYBERSPACE 7 2020-05-10T16:11:51+00:00
rocketumbl: rocketumbl: ファイアボールSG Ma.K. あま製作
ONO-SENDAI CYBERSPACE 7 2020-05-10T16:14:33+00:00
⚔️ The Panzer Paladin Gameplay Trailer is here! ⚔️ ✅ Click here...
Tribute Games 2020-05-15T14:01:43+00:00
Revisiting the postcard pathtracer
Fabien Sanglard 2020-05-18T00:00:00+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2020-05-18T22:03:26+00:00
37: The Clojurists Together Foundation with lvh
The REPL 2020-05-21T09:13:00+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2020-05-23T14:57:13+00:00
A tale of Ghosts'n Goblins'n Crocos
Fabien Sanglard 2020-05-30T00:00:00+00:00
A simple way to get more value from metrics
Dan Luu 2020-05-30T07:06:34+00:00
A simple way to get more value from tracing
Dan Luu 2020-05-31T07:06:34+00:00
Passing runtime data to AWK
Arabesque 2020-05-31T11:55:54+00:00
Straight Out Of Furlough
GAMEPOPPER 2020-05-31T15:07:51+00:00
Finding the Story
Dan Luu 2020-06-02T07:05:34+00:00
I want to make sure that nobody is missing what Trump is doing he calls out for “Law and Order” and…
Squashed 2020-06-03T00:34:10+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2020-06-05T17:14:07+00:00
Discret 11, the French TV encryption of the 80's
Fabien Sanglard 2020-06-07T00:00:00+00:00
NeXTstep on the HP 712 Part 2: Getting Software
Pizza Box Computer 2020-06-09T13:45:00+00:00
Life Harvester #18: Frances Beal, Sylvia Rivera, Walter Benjamin
Life Harvester 2020-06-15T16:33:31+00:00
HASH: a free, online platform for modeling the world
Joel on Software 2020-06-18T14:12:25+00:00
agnosthesia
The Dictionary of Obscure Sorrows 2020-06-20T19:11:05+00:00
Alyse Galvin on Coronavirus in Alaska
Idle Words 2020-06-24T13:30:00+00:00
How do cars do in out-of-sample crash testing?
Dan Luu 2020-06-30T07:06:34+00:00
Ultimate Oldschool PC Font Pack v2.0 Released
int10h.org - VileR's blog 2020-07-13T09:25:18+00:00
Life Harvester #19: Black, Young, & Educated
Life Harvester 2020-07-15T10:34:00+00:00
The Gods Pocket Peak Trail
Embedded in Academia 2020-07-23T15:41:01+00:00
Alive2 Part 3: Things You Can and Can’t Do with Undef in LLVM
Embedded in Academia 2020-07-31T20:33:05+00:00
wrenmcdonald: Ex.Mag FULL METAL DREAMLAND, the genre-based...
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:19:30+00:00
Dragon’s Heaven - Makoto Kobayashi, Toshihiro Hirano
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:19:37+00:00
Photo
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:19:49+00:00
Photo
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:20:12+00:00
Photo
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:20:19+00:00
ultrakillblast:TRON (1982)
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:20:29+00:00
Photo
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:21:42+00:00
wrenmcdonald:Ex.Mag 01 back cover 💚
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:22:18+00:00
Photo
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:22:48+00:00
Photo
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:23:12+00:00
Photo
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:23:33+00:00
yodawgiheardyoulikemecha: Space dude
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:24:02+00:00
ravenkult: Fullfillment Center by Brian Sum...
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:24:11+00:00
Photo
ONO-SENDAI CYBERSPACE 7 2020-08-02T13:24:41+00:00
ZZT Stories: The Reconstruction
Posts on asie's blog 2020-08-04T20:30:00+00:00
⚔️ Panzer Paladin: OUT NOW! ⚔️ ✅ Get it on Steam! ✅🎮 Get it on...
Tribute Games 2020-08-07T15:05:56+00:00
Top 5 indie games of July 2020: The IND13 Picks
Tribute Games 2020-08-07T15:07:13+00:00
Sara Huddleston on the Latino Vote in Iowa
Idle Words 2020-08-08T00:40:00+00:00
Responsible and Effective Bugfinding
Embedded in Academia 2020-08-17T18:36:43+00:00
Life Harvester #20: Prison Abolition
Life Harvester 2020-08-18T16:33:34+00:00
Okay how did I not understand this situation until Steve Bannon...
Squashed 2020-08-20T16:47:27+00:00
When NAT Bites — Use a Reverse VPN
Winny's Blog 2020-08-31T05:00:00+00:00
Effective Political Giving
Idle Words 2020-09-03T04:05:00+00:00
tinycartridge: Panzer Paladin is better than pretty much any...
Tribute Games 2020-09-09T17:06:01+00:00
Life Harvester #21: T'Shuva, Electric Shavers, Sleep Headphones, Granola
Life Harvester 2020-09-15T13:02:06+00:00
Git Push
int10h.org - VileR's blog 2020-09-16T14:21:09+00:00
azspot: “The threat of increasing the size of the court to 13 might be enough to discourage...
Squashed 2020-09-22T12:59:47+00:00
Recent Trends in Wealth-Holding by Race and Ethnicity: Evidence from the Survey of Consumer Finances
Squashed 2020-09-24T18:55:42+00:00
Systemic racism really isn’t complicated or controversial
Squashed 2020-09-24T21:06:42+00:00
Another analogy on education about white supremacy
Squashed 2020-09-27T17:24:52+00:00
On Trump’s Taxes
Squashed 2020-09-28T15:26:54+00:00
A personal example of systemic racism
Squashed 2020-09-28T16:58:55+00:00
"This sleazy Supreme Court double-dealing is the last gasp of a corrupt Republican leadership, numb..."
Squashed 2020-09-28T18:30:52+00:00
On Colorblindness
Squashed 2020-10-01T22:29:10+00:00
Switching to Lenovo Carbon X1
Fabien Sanglard 2020-10-02T00:00:00+00:00
2020 Episode 1
Verb Your Enthusiasm 2020-10-03T22:49:06+00:00
Protests and Power
Idle Words 2020-10-04T22:17:00+00:00
2020 Episode 2
Verb Your Enthusiasm 2020-10-11T08:42:13+00:00
WHEN 13.3 > 14
Fabien Sanglard 2020-10-12T00:00:00+00:00
Life Harvester #22: The Ramones, H2O, Less Than Jake, 25 Ta Life, Blanks 77, US Bombs, Social Distortion
Life Harvester 2020-10-15T16:25:48+00:00
Twist Turn Shoot Burn: A Postmortem
GAMEPOPPER 2020-10-19T17:16:17+00:00
On using Markdown with Sphinx - onward to Evennia 0.9.5
Griatch's Evennia musings 2020-10-19T22:21:00+00:00
On using Markdown with Sphinx
Evennia Devblog RSS Feed 2020-10-20T00:00:00+00:00
Ultima: Through Farthest Lands and Deepest Dungeons
CRPG Adventures 2020-10-27T08:18:00+00:00
Newsletter #10 - Neovim v0.4.4
Neovim 2020-10-28T00:00:00+00:00
Game Engine Black Book: Wolfenstein 3D, Korean Edition
Fabien Sanglard 2020-10-30T00:00:00+00:00
Neon Signs Banana Neon Light Sign Real Glass Neon Sign Neon Lights Neon Wall Sign Real Neon Decorative Light for Home Bedroom Room Decor Bar Office Halloween Party - - Amazon.com
Terrible Banana 2020-10-31T03:03:32+00:00
Nearly 60% of registered voters in North Carolina have voted
Squashed 2020-10-31T17:30:53+00:00
Ultima: Victory!
CRPG Adventures 2020-11-03T10:29:00+00:00
About my keyboard choices
Winny's Blog 2020-11-04T02:37:00+00:00
Game 50: Kadath (1979)
CRPG Adventures 2020-11-08T15:23:00+00:00
Mafia II – Hat vs. Hair
Simonschreibt. 2020-11-09T17:05:01+00:00
Full motion video in ZZT: State of the art
Posts on asie's blog 2020-11-09T19:51:00+00:00
These are called opportunities
Fabien Sanglard 2020-11-12T00:00:00+00:00
Wenyan-lang
esoteric.codes 2020-11-12T07:04:00+00:00
Evennia 0.9.5 released
Evennia Devblog RSS Feed 2020-11-14T00:00:00+00:00
Evennia 0.9.5 released!
Griatch's Evennia musings 2020-11-14T17:46:00+00:00
Game 51: Local Call for Death (1979)
CRPG Adventures 2020-11-15T19:01:00+00:00
Life Harvester #23: Bëëf Stew, Spooky Movies & TV Shows
Life Harvester 2020-11-16T19:11:04+00:00
Classical Chinese as a Programming Language
esoteric.codes 2020-11-23T07:05:00+00:00
Computing with JS's undefined
esoteric.codes 2020-11-23T13:12:00+00:00
2020 Episode 3
Verb Your Enthusiasm 2020-11-29T17:44:11+00:00
Oak
esoteric.codes 2020-12-01T06:23:00+00:00
More Font Updates: Oldschool PC Pack, Flexi IBM VGA
int10h.org - VileR's blog 2020-12-01T20:23:23+00:00
Recovering data from a corrupted USB thumbdrive using ddrescue
The Grymoire 2020-12-01T20:58:50+00:00
Turing Paint
esoteric.codes 2020-12-14T06:47:00+00:00
Life Harvester #24: Leaf Piles, Throwing Blueberries At Yogurt, Ask a Shmuck, Miss D's Movie Madness
Life Harvester 2020-12-14T12:07:02+00:00
The beautiful silent thunderbolt-3 PC
Fabien Sanglard 2020-12-22T00:00:00+00:00
Against essential and accidental complexity
Dan Luu 2020-12-29T00:00:00+00:00
Life Harvester #🤷🏻‍♀️: Your Favorite Thing
Life Harvester 2020-12-30T19:13:30+00:00
Happy New Years 2021!
Evennia Devblog RSS Feed 2021-01-01T00:00:00+00:00
Happy new years 2021! Evennia things to come this year
Griatch's Evennia musings 2021-01-01T12:38:00+00:00
xchg rax, rax
esoteric.codes 2021-01-05T06:17:00+00:00
untitled
Squashed 2021-01-08T02:20:10+00:00
The confusing world of USB
Fabien Sanglard 2021-01-10T00:00:00+00:00
About my Medium posts
LESSIG Blog 2021-01-10T14:29:42+00:00
MEDIUM: Ted Cruz and Josh Hawley’s illegal objection
LESSIG Blog 2021-01-10T14:36:24+00:00
Simulating CRT Monitors with FFmpeg (Pt. 1: Color CRTs)
int10h.org - VileR's blog 2021-01-10T16:29:21+00:00
Autopoiesis
EXO 2021-01-11T05:34:58+00:00
aftersome
The Dictionary of Obscure Sorrows 2021-01-11T21:20:27+00:00
How to Escape the Confines of Time and Space According to the CIA
EXO 2021-01-13T17:16:06+00:00
The Geomagnetic Field and Us
EXO 2021-01-14T15:15:07+00:00
MEDIUM: Why Senator Hawley’s latest defense is just more offense.
LESSIG Blog 2021-01-15T18:44:02+00:00
Stretching The Electric Diamond
EXO 2021-01-18T19:30:43+00:00
KFC Mascot Col. Sanders Talks Malbolge Programming on General Hospital—Wait, What?
esoteric.codes 2021-01-19T04:56:00+00:00
Testers wanted!
DeathTrash 2021-01-19T16:27:50+00:00
Life Harvester #25: Lazy Magnet, It Did Happen Here, Pedrodamus 2021 Trend Forecast
Life Harvester 2021-01-22T13:28:09+00:00
DONE: Final words on the Cruz and Hawley outrage
LESSIG Blog 2021-01-22T16:38:09+00:00
Oral Argument in PATRICK v. Alaska
LESSIG Blog 2021-01-22T16:39:22+00:00
Hey journalists, here’s the question you need to be asking the insurrectionists.
LESSIG Blog 2021-01-25T16:43:57+00:00
Simulating CRT Monitors with FFmpeg (Pt. 2: Monochrome CRTs)
int10h.org - VileR's blog 2021-02-03T21:43:17+00:00
Interview with 100 Rabbits
esoteric.codes 2021-02-04T05:10:00+00:00
A Global Kind of Mood
EXO 2021-02-05T00:11:44+00:00
You want sudo -i or su -
Winny's Blog 2021-02-14T22:21:00+00:00
Life Harvester #26: Milford Graves, Best Friends, The Big Bagel Question, Miss Soup Pussy
Life Harvester 2021-02-18T12:35:04+00:00
Back in Development
The Ground Gives Way 2021-03-01T16:06:58+00:00
Interview with David Madore
esoteric.codes 2021-03-02T06:36:00+00:00
Trunk Updates 2 March 2021
Dungeon Crawl Stone Soup 2021-03-02T12:41:16+00:00
Simulating NON-CRT Monitors with FFmpeg: Flat Panel Displays
int10h.org - VileR's blog 2021-03-02T22:09:44+00:00
🌱⚡The Plant with a Pulse ⚡🌱
EXO 2021-03-03T01:54:27+00:00
Priority Adventure 3: Mission: Asteroid (1980)
CRPG Adventures 2021-03-11T11:28:00+00:00
❤️⚓ Flinthook Concept Art: Mr.Blort is a tired fellow. You...
Tribute Games 2021-03-14T16:01:02+00:00
Some tips when copying, recovering disks
Winny's Blog 2021-03-15T00:53:00+00:00
What Programming Language Would Yoko Ono Create?
esoteric.codes 2021-03-16T06:44:00+00:00
Life Harvester 27: I Killed Kurt Cobain, This Shirt Sucks, Some Records I Love, Dykes To Watch Out For
Life Harvester 2021-03-16T12:01:40+00:00
Writing Small CLI Programs in Common Lisp
Steve Losh 2021-03-17T16:10:00+00:00
Rogue of the Seven Seas – 7DRL Postmortem
GAMEPOPPER 2021-03-17T18:00:00+00:00
Where do I begin?
Evennia Devblog RSS Feed 2021-03-21T00:00:00+00:00
Where do I begin? (repost)
Griatch's Evennia musings 2021-03-21T11:59:00+00:00
Snarf YouTube videos off gather.town
Winny's Blog 2021-03-22T02:41:40+00:00
Game 52: Eamon Scenario 2 - The Lair of the Minotaur (1979)
CRPG Adventures 2021-03-22T12:34:00+00:00
Trunk Updates 23 March 2021
Dungeon Crawl Stone Soup 2021-03-23T10:05:04+00:00
Interview with Jon Corbett
esoteric.codes 2021-03-30T06:32:00+00:00
ringlorn
The Dictionary of Obscure Sorrows 2021-04-02T14:23:23+00:00
The Matt Gaetz Saga
Squashed 2021-04-02T14:34:47+00:00
Trunk Updates 2 April 2021
Dungeon Crawl Stone Soup 2021-04-02T17:45:09+00:00
Moving blog to ox-hugo
Winny's Blog 2021-04-03T06:46:00+00:00
Game 53: Maces & Magic - Balrog Sampler (1979)
CRPG Adventures 2021-04-03T10:38:00+00:00
Game Engine Black Book: DOOM, Korean Edition
Fabien Sanglard 2021-04-05T00:00:00+00:00
'The Gateway' NFT
EXO 2021-04-07T21:30:18+00:00
Safe CRT Monitor Shipping: IBM 5153 Makes it through DHL!
int10h.org - VileR's blog 2021-04-08T11:44:26+00:00
Found: Page 25 of the CIA’s Gateway Report on Astral Projection
EXO 2021-04-08T13:00:37+00:00
Life Harvester 28: RIP Dan Klein, A Character Sketch From A Novel I'm Writing
Life Harvester 2021-04-15T16:29:51+00:00
Trunk Updates 19 April 2021
Dungeon Crawl Stone Soup 2021-04-19T16:32:56+00:00
Stripe and Solid-State Economics
The Diff 2021-05-07T13:56:33+00:00
Balrog Sampler: Near Victory
CRPG Adventures 2021-05-11T16:06:00+00:00
Photo
Terrible Banana 2021-05-11T16:45:32+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2021-05-12T22:03:57+00:00
llvm-reduce
Embedded in Academia 2021-05-13T16:58:00+00:00
Lockdown: Day 3,689.G-G on Facebook - G-G on Twitter
garfield minus garfield 2021-05-14T10:53:53+00:00
Life Harvester 29: Remix Requests, Hate Your Friends Vs Dream Baby Dream, The Last Time I Did Acid, Writing Letters
Life Harvester 2021-05-14T12:24:19+00:00
Observing my cellphone switch towers
Fabien Sanglard 2021-05-15T00:00:00+00:00
The Harmonic Grid
EXO 2021-05-16T15:00:32+00:00
Buddy Roemer, RIP
LESSIG Blog 2021-05-18T12:59:00+00:00
Priority CRPG 4: Wizardry: Proving Grounds of the Mad Overlord (1981)
CRPG Adventures 2021-05-26T16:07:00+00:00
Freenode is dead, long live Freenode!
Winny's Blog 2021-05-27T04:13:05+00:00
Finished all of netflix.
garfield minus garfield 2021-05-31T20:28:33+00:00
Kinda a big announcement
Joel on Software 2021-06-02T16:36:19+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2021-06-06T21:55:57+00:00
The Apple Compact Unwinding Format: Documented and Explained - Faultlore
Faultlore 2021-06-09T00:00:00+00:00
Wizardry: Level One
CRPG Adventures 2021-06-14T13:07:00+00:00
Mind Set (2) An interpretation of the delicate balance of...
Romain Laurent 2021-06-14T19:54:47+00:00
IRC presence moved to libera.chat
Dungeon Crawl Stone Soup 2021-06-15T16:06:30+00:00
Interview with Zzo38
esoteric.codes 2021-06-21T07:39:00+00:00
Plugging My Newest Blog
CRPG Adventures 2021-06-22T05:59:00+00:00
Life Harvester 30: Elon Musk Meth Conspiracy, Ask A Shmuck, Marc & Olivia's Bidet, Free Palestine, Texan Summer Jams
Life Harvester 2021-06-25T11:17:00+00:00
Trunk Updates 25 June 2021
Dungeon Crawl Stone Soup 2021-06-25T19:16:31+00:00
Wizardry: Level Two
CRPG Adventures 2021-06-27T07:01:00+00:00
)Comfort Zone(
Romain Laurent 2021-06-27T22:44:19+00:00
Meet Chuck Easttom
EXO 2021-06-29T01:35:13+00:00
Enjoying the view
Romain Laurent 2021-06-30T01:16:17+00:00
The EXO Guide to Steganography
EXO 2021-06-30T16:51:24+00:00
Trunk Updates 1 July 2021 and Tournament Date Set
Dungeon Crawl Stone Soup 2021-07-01T21:43:20+00:00
Wizardry: Level Three
CRPG Adventures 2021-07-04T08:54:00+00:00
No need to reinstall your OS
Winny's Blog 2021-07-09T02:01:42+00:00
2020 Prize Episode: Vain Empires
Verb Your Enthusiasm 2021-07-09T15:57:48+00:00
Neovim News #11 - The Christmas Issue
Neovim 2021-07-12T00:00:00+00:00
Wizardry: Grinding Interlude
CRPG Adventures 2021-07-12T13:24:00+00:00
A moment of self reflection
Romain Laurent 2021-07-12T20:16:46+00:00
Fat Dactyls
esoteric.codes 2021-07-13T10:18:00+00:00
Life Harvester 31: First Ever Poetry Issue feat. Mya Spalter, Thera Webb, Ana Armengod, David Morse
Life Harvester 2021-07-20T14:06:58+00:00
A monorepo misconception - atomic cross-project commits
Juho Snellman's Weblog 2021-07-21T11:00:00+00:00
Wizardry: Level Four
CRPG Adventures 2021-07-21T17:30:00+00:00
In the Movie “Das letzte Land” (D 2019, Dir.: Marcel...
Source Code in TV and Films 2021-07-22T08:19:48+00:00
From Guns Akimbo (2019), some generic ffmpeg wrapper Go code,...
Source Code in TV and Films 2021-07-22T08:19:54+00:00
From WWE’s Money In The Bank PPV - the “Smackdown...
Source Code in TV and Films 2021-07-22T08:20:08+00:00
In Czech movie Vysoká hra there is “high-end" police...
Source Code in TV and Films 2021-07-22T08:24:17+00:00
American Gods Season 2
Source Code in TV and Films 2021-07-22T08:24:25+00:00
Watching #UploadOnPrime and it’s 2033 and they are in a file...
Source Code in TV and Films 2021-07-22T08:24:33+00:00
Futurama Season 1 Episode 9 Basic code to go hell.
Source Code in TV and Films 2021-07-22T08:24:39+00:00
A screenshot from the recent UK broadcast of the “Why We...
Source Code in TV and Films 2021-07-22T08:24:50+00:00
Photo
Terrible Banana 2021-07-24T01:06:34+00:00
July 24 Trunk Update Post and 0.27 Tournament Page
Dungeon Crawl Stone Soup 2021-07-24T07:37:39+00:00
Wizardry: Level Five
CRPG Adventures 2021-07-26T09:33:00+00:00
640 Pages in 15 Months
journal.stuffwithstuff.com 2021-07-29T07:00:00+00:00
0.27 “The Cursed Flame”
Dungeon Crawl Stone Soup 2021-07-30T07:06:42+00:00
Wizardry: Levels Six to Eight
CRPG Adventures 2021-08-01T19:19:00+00:00
Beams of consciousness
Romain Laurent 2021-08-03T00:15:09+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2021-08-04T21:09:52+00:00
The Dictionary of Obscure Sorrows—the book. 12 years in the...
The Dictionary of Obscure Sorrows 2021-08-08T12:42:50+00:00
Wizardry: Level 9 (a tale of two disasters)
CRPG Adventures 2021-08-09T08:32:00+00:00
Using an old Supermicro IPMI to configure broken networking
Winny's Blog 2021-08-10T01:05:00+00:00
Life Harvester 32: Actual Freaks, Munchin On Crunchy Cukes, Delta Variant, A Picture Of A Poster
Life Harvester 2021-08-17T11:25:11+00:00
Coding in Indigenous African Languages
esoteric.codes 2021-08-18T09:20:00+00:00
Screenshot from the Departure season 2 episode 1 showing part of...
Source Code in TV and Films 2021-08-20T07:36:52+00:00
0.27 Tournament Results
Dungeon Crawl Stone Soup 2021-08-20T10:31:29+00:00
0.27.1 Bugfix Release
Dungeon Crawl Stone Soup 2021-08-21T18:45:50+00:00
Measurement, benchmarking, and data analysis are underrated
Dan Luu 2021-08-27T00:00:00+00:00
Trunk Updates 29 August 2021
Dungeon Crawl Stone Soup 2021-08-29T07:41:14+00:00
Monkey: the satirical Go package used unwittingly by Arduino and SalesForce
esoteric.codes 2021-08-30T10:33:00+00:00
38: Banking and Clojure with Allen Rohner
The REPL 2021-08-31T03:30:00+00:00
Wizardry: Level 10
CRPG Adventures 2021-09-05T07:45:00+00:00
39: Clojure Goes Fast with Alexander Yakushev
The REPL 2021-09-06T20:00:00+00:00
Slight magic rework
The Ground Gives Way 2021-09-08T13:48:45+00:00
austice
The Dictionary of Obscure Sorrows 2021-09-09T20:13:53+00:00
40: Shipping Clojure code with Paulus Esterhazy
The REPL 2021-09-13T20:00:00+00:00
Photo
Terrible Banana 2021-09-19T15:17:59+00:00
41: Clojure pre-history with Chris Houser
The REPL 2021-09-20T19:08:00+00:00
Life Harvester 33: Body-ody-ody-ody-ody-ody-ody (Weekend Sensation Journal), Kelly's Turnstile Review
Life Harvester 2021-09-21T17:31:04+00:00
Escher Circuits: Using Vision to Perform Computation
esoteric.codes 2021-09-22T07:29:00+00:00
Trunk Updates 23 September 2021
Dungeon Crawl Stone Soup 2021-09-23T16:22:45+00:00
The value of in-house expertise
Dan Luu 2021-09-29T00:00:00+00:00
Censer and Aggravation Rework
The Ground Gives Way 2021-09-29T21:27:21+00:00
Bitcoin
codersnotes.com 2021-10-03T07:00:00+00:00
42: Faster JSON parsing with Erik Assum
The REPL 2021-10-07T08:00:00+00:00
Some reasons to work on productivity and velocity
Dan Luu 2021-10-15T00:00:00+00:00
What to learn
Dan Luu 2021-10-18T00:00:00+00:00
Willingness to look stupid
Dan Luu 2021-10-21T00:00:00+00:00
🔮 web3 Is In Our Nature II 🌱
EXO 2021-10-29T01:35:44+00:00
🔮 web3 Is In Our Nature I 🌱
EXO 2021-10-29T14:00:43+00:00
Unstable Grounds – Ludum Dare and the Future
GAMEPOPPER 2021-10-30T10:23:00+00:00
FATAL FRAME / PROJECT ZERO: Maiden of Black Water Table for Cheat Engine
Ian Murdock 2021-11-03T02:09:41+00:00
Persona 5 Table for Cheat Engine
Ian Murdock 2021-11-03T15:57:09+00:00
Cyberpunk 2077 Table for Cheat Engine
Ian Murdock 2021-11-03T17:22:45+00:00
Prison Simulator Table for Cheat Engine
Ian Murdock 2021-11-04T23:21:01+00:00
Forza Horizon 5 Trainer
Ian Murdock 2021-11-05T22:50:44+00:00
A Close Look at a Spinlock
Embedded in Academia 2021-11-06T19:57:06+00:00
Culture matters
Dan Luu 2021-11-08T00:00:00+00:00
Let’s Build a Zoo Cheat Engine Table
Ian Murdock 2021-11-08T22:09:04+00:00
Jurassic World Evolution 2 Trainer
Ian Murdock 2021-11-09T23:13:45+00:00
Pre-order your copy of “The Dictionary of Obscure Sorrows” from Simon & Schuster:…
The Dictionary of Obscure Sorrows 2021-11-12T02:05:49+00:00
PRE-ORDER your copy of “The Dictionary of Obscure Sorrows” here, from Simon & Schuster:…
The Dictionary of Obscure Sorrows 2021-11-13T02:26:40+00:00
43: Clojure, The Essential Reference with Renzo Borgatti
The REPL 2021-11-13T03:10:33+00:00
Before CurseForge (Microblog)
Posts on asie's blog 2021-11-14T10:32:00+00:00
etterath
The Dictionary of Obscure Sorrows 2021-11-14T20:02:53+00:00
Individuals matter
Dan Luu 2021-11-15T00:00:00+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2021-11-15T21:37:20+00:00
Forza Horizon 5 Cheat Engine Table
Ian Murdock 2021-11-16T13:28:35+00:00
This project started here on Tumblr more than 10 years ago. To all my followers, I can’t thank you…
The Dictionary of Obscure Sorrows 2021-11-16T16:44:47+00:00
Banana Phone Bluetooth Handset for Cell Phones
Terrible Banana 2021-11-16T17:07:04+00:00
CONGRATS for the publication of the book !! I’m so glad this is finally happening ! I just have one question : how is are the contents organized ? Is it like a normal dictionary with entries in alphabetical order, or in order of creation like this tumblr, or something else entirely ?
The Dictionary of Obscure Sorrows 2021-11-17T17:06:59+00:00
The blog moved!
Evennia Devblog RSS Feed 2021-11-18T00:00:00+00:00
Forza Horizon 5 Cheat Engine {vinny2k}
Ian Murdock 2021-11-18T13:20:29+00:00
watashiato
The Dictionary of Obscure Sorrows 2021-11-18T20:17:16+00:00
The Evennia blog has moved to evennia.com!
Griatch's Evennia musings 2021-11-18T21:53:00+00:00
Major errors on this blog (and their corrections)
Dan Luu 2021-11-22T00:00:00+00:00
I never imagined this was possible, but “The Dictionary of Obscure Sorrows” is now a New York Times…
The Dictionary of Obscure Sorrows 2021-11-25T04:05:40+00:00
Migrating from Emacs 26 to Emacs 27 on Gentoo
Winny's Blog 2021-11-28T06:00:00+00:00
'Space Covidders' Goes to the Arcade?
int10h.org - VileR's blog 2021-11-28T20:37:05+00:00
Thievery rework/rebalancing
The Ground Gives Way 2021-12-02T13:21:26+00:00
Some latency measurement pitfalls
Dan Luu 2021-12-06T00:00:00+00:00
Trunk Updates 6 December 2021
Dungeon Crawl Stone Soup 2021-12-06T20:21:04+00:00
Halo Infinite Trainer
Ian Murdock 2021-12-10T20:02:00+00:00
Stanford Professor Garry Nolan Is Analyzing Anomalous Materials From UFO Crashes
EXO 2021-12-10T21:40:33+00:00
Some thoughts on writing
Dan Luu 2021-12-13T00:00:00+00:00
Melee Weapon Rebalancing
The Ground Gives Way 2021-12-17T13:28:52+00:00
The container throttling problem
Dan Luu 2021-12-18T00:00:00+00:00
Trunk Updates 18 December 2021
Dungeon Crawl Stone Soup 2021-12-18T13:55:26+00:00
Following Street Fighter 2 paper trails
Fabien Sanglard 2021-12-22T00:00:00+00:00
Street Fighter 2: The World Warrier
Fabien Sanglard 2021-12-23T00:00:00+00:00
Street Fighter 2: Subtile accurate animation
Fabien Sanglard 2021-12-24T00:00:00+00:00
Street Fighter 2: Spin when you can't
Fabien Sanglard 2021-12-24T00:00:00+00:00
Trunk Updates 28 December 2021
Dungeon Crawl Stone Soup 2021-12-28T18:08:10+00:00
Updating The Single Most Influential Book of the BASIC Era
Coding Horror 2021-12-31T23:49:00+00:00
ZZT World Creation Contest '91: Allen Pilgrim and Tom Breton's recollections
Posts on asie's blog 2022-01-04T19:15:00+00:00
Into 2022 with thanks and plans
Evennia Devblog RSS Feed 2022-01-06T00:00:00+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2022-01-08T11:42:47+00:00
11-Streak by Gambler Justice
The Ground Gives Way 2022-01-11T12:35:36+00:00
Street Fighter 2: Sound System Internals
Fabien Sanglard 2022-01-15T00:00:00+00:00
Trunk Updates 17 January 2022 and Tournament Announcement
Dungeon Crawl Stone Soup 2022-01-18T04:27:28+00:00
TGGW v2.6 is out!
The Ground Gives Way 2022-01-22T11:24:32+00:00
Destroy All Values: Designing Deinitialization in Programming Languages - Faultlore
Faultlore 2022-01-23T00:00:00+00:00
Compiling GhidraNinja’s Pico Debug’N’Dump
The Grymoire 2022-01-24T18:03:54+00:00
Making the web better. With blocks!
Joel on Software 2022-01-27T17:14:00+00:00
Set up a Private GitLab Runner on Alpine Linux
Winny's Blog 2022-01-30T05:00:21+00:00
0.28 Tournament Page and Schedule
Dungeon Crawl Stone Soup 2022-02-01T17:31:36+00:00
A decade of major cache incidents at Twitter
Dan Luu 2022-02-02T00:00:00+00:00
Cocktail party ideas
Dan Luu 2022-02-02T00:00:00+00:00
Life Harvester 34: I Still Can't Get Any Writing Done So I Reprinted A Joan Didion Essay Without Permission
Life Harvester 2022-02-03T18:21:30+00:00
0.28 “The Rise and Fall of Ignis Zotdust and the Spiders from Hell”
Dungeon Crawl Stone Soup 2022-02-03T23:34:42+00:00
plague
Mighty Vision 2022-02-07T13:01:00+00:00
plague
Mighty Vision 2022-02-07T13:01:00+00:00
Auto-rip Music CDs
Winny's Blog 2022-02-07T20:52:27+00:00
Mind Set
Romain Laurent 2022-02-09T20:22:27+00:00
The Factorio Mindset
The Diff 2022-02-11T14:31:56+00:00
Stupid Dog
journal.stuffwithstuff.com 2022-02-13T08:00:00+00:00
But her emails!
Squashed 2022-02-19T00:39:11+00:00
CPS-1: GFX system internals
Fabien Sanglard 2022-02-20T00:00:00+00:00
Misidentifying talent
Dan Luu 2022-02-21T00:00:00+00:00
0.28 Tournament Results
Dungeon Crawl Stone Soup 2022-02-21T22:31:37+00:00
Give me all the PC Engine ports ⊟
Tiny Cartridge 3DS 2022-02-25T18:01:03+00:00
Wizardry: Pyrrhic Victory
CRPG Adventures 2022-02-27T07:42:00+00:00
Faultlore: Learning Through Errors - Faultlore
Faultlore 2022-02-27T18:18:56+00:00
Great news for your PAC-PASSION ⊟
Tiny Cartridge 3DS 2022-03-02T23:24:41+00:00
The 2030 Self-Driving Car Bet
Coding Horror 2022-03-04T18:53:32+00:00
Trunk updates 6 March 2022
Dungeon Crawl Stone Soup 2022-03-06T17:12:55+00:00
Why is it so hard to buy things that work well?
Dan Luu 2022-03-14T00:00:00+00:00
C Isn't A Programming Language Anymore - Faultlore
Faultlore 2022-03-16T21:24:08+00:00
Rust's Unsafe Pointer Types Need An Overhaul - Faultlore
Faultlore 2022-03-19T22:13:17+00:00
DSTs Are Just Polymorphically Compiled Generics - Faultlore
Faultlore 2022-03-30T22:13:17+00:00
Taito Milestones out April 15 ⊟
Tiny Cartridge 3DS 2022-04-05T18:48:34+00:00
The Tower of Weakenings: Memory Models For Everyone - Faultlore
Faultlore 2022-04-05T20:07:14+00:00
In defense of simple architectures
Dan Luu 2022-04-06T00:00:00+00:00
Defaults Affect Inference in Rust: Expressions Instead Of Types - Faultlore
Faultlore 2022-04-10T23:00:13+00:00
There appears to be some disagreement regarding whether Ukraine sank the Moskva with missiles or, as…
Squashed 2022-04-14T22:40:40+00:00
Gotta Protectors is back and a little weirder on Switch ⊟
Tiny Cartridge 3DS 2022-04-16T13:06:03+00:00
What if you… listened to Retronauts this week ⊟
Tiny Cartridge 3DS 2022-04-21T13:49:26+00:00
Life Harvester 35: May Her Memory Be A Blessing / זיכרונה לברכה / Zikhrona Livrakha, Poems About Grief
Life Harvester 2022-04-22T19:16:19+00:00
Neovim News #12 - What's New In Neovim 0.7
Neovim 2022-04-26T00:00:00+00:00
My Taipei Quarantine
Idle Words 2022-04-26T23:12:00+00:00
Priority Adventure 4: Strange Odyssey (1979)
CRPG Adventures 2022-05-01T07:49:00+00:00
USB Cheat Sheet
Fabien Sanglard 2022-05-05T00:00:00+00:00
The Beautiful Diablo 2 Resurrected machine
Fabien Sanglard 2022-05-08T00:00:00+00:00
Racket on Digital Ocean App Platform
Winny's Blog 2022-05-15T18:52:27+00:00
GDC/ADDON 2022: How (not) to create Textures for VFX
Simonschreibt. 2022-05-22T14:01:32+00:00
Priority Adventure 5: Mystery Fun House (1979)
CRPG Adventures 2022-05-24T11:25:00+00:00
Are you the absolute maniac who will buy Bob with no Bub ⊟
Tiny Cartridge 3DS 2022-05-30T16:45:15+00:00
High-Throughput, Formal-Methods-Assisted Fuzzing for LLVM
Embedded in Academia 2022-05-31T14:56:41+00:00
Why Build?
codersnotes.com 2022-06-03T07:00:00+00:00
About the PS/2 30-286's Hidden VGA Fonts
int10h.org - VileR's blog 2022-06-05T18:11:12+00:00
Formal-Methods-Based Bugfinding for LLVM’s AArch64 Backend
Embedded in Academia 2022-06-06T14:58:02+00:00
NixOS Migration
Winny's Blog 2022-06-08T18:12:00+00:00
The IBM 5153's True CGA Palette and Color Output
int10h.org - VileR's blog 2022-06-11T00:32:05+00:00
A match made in the eShop ⊟
Tiny Cartridge 3DS 2022-06-16T14:20:05+00:00
Trunk Updates 19 June 2022
Dungeon Crawl Stone Soup 2022-06-19T14:19:05+00:00
Save As: DNA 🧬 Part 1
EXO 2022-06-23T15:33:03+00:00
Priority Adventure 6: Pyramid of Doom (1979)
CRPG Adventures 2022-06-29T12:59:00+00:00
My Famicase Exhibition opening in LA ⊟
Tiny Cartridge 3DS 2022-06-30T17:47:08+00:00
Corporate consolidation is good, actually (in this one weird specific case) ⊟
Tiny Cartridge 3DS 2022-06-30T19:57:37+00:00
Tutorial-writing and Attributes galore
Evennia Devblog RSS Feed 2022-07-05T00:00:00+00:00
Cool DIY Super Famicom kit turned into cooler mini-TV kit ⊟
Tiny Cartridge 3DS 2022-07-11T20:20:12+00:00
Priority Adventure 7: Zork: The Great Underground Empire (1980)
CRPG Adventures 2022-07-17T15:25:00+00:00
On harm reduction
Apperceptive by Sam 2022-07-18T13:46:36+00:00
Trunk Updates 18 July 2022
Dungeon Crawl Stone Soup 2022-07-19T03:44:42+00:00
cafe la siesta -8bit edition!!!- 20th Anniversary...
⌘+V 2022-07-20T00:38:00+00:00
Driving is a social process
Apperceptive by Sam 2022-07-20T17:01:08+00:00
Romance of the three Kunios today ⊟
Tiny Cartridge 3DS 2022-07-21T15:11:49+00:00
The Nightmare Scenario
Apperceptive by Sam 2022-07-25T15:24:00+00:00
What autonomous cars see
Apperceptive by Sam 2022-07-27T18:53:07+00:00
#lang tinybasic
Winny's Blog 2022-07-28T02:20:43+00:00
Understanding Jane Street
The Diff 2022-08-01T12:39:05+00:00
The urbanist case for autonomous cars
Apperceptive by Sam 2022-08-01T19:46:48+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2022-08-02T19:17:38+00:00
Computer vision is badly defined
Apperceptive by Sam 2022-08-04T12:42:17+00:00
What if we were actually trying to use technology to make cars safe?
Apperceptive by Sam 2022-08-08T20:28:44+00:00
Simon’s Tech Art Learning Materials
Simonschreibt. 2022-08-13T18:12:26+00:00
Priority Adventure 8: Ghost Town (1980)
CRPG Adventures 2022-08-14T13:21:00+00:00
Trunk Updates 14 August 2022 and Tournament Announcement
Dungeon Crawl Stone Soup 2022-08-14T21:36:58+00:00
Publishers old and new bring me games old and old ⊟
Tiny Cartridge 3DS 2022-08-19T21:23:03+00:00
Hand-Crafted Artisanal Liquidity Provision
The Diff 2022-08-22T12:05:04+00:00
0.29 “Shooting Stars”
Dungeon Crawl Stone Soup 2022-08-25T00:42:44+00:00
There’s a new Puzzle Bobble and it’s very important  ⊟
Tiny Cartridge 3DS 2022-08-26T00:13:34+00:00
Depending in Common Lisp
Steve Losh 2022-08-26T15:15:00+00:00
Save As: DNA 🧬 Part 2
EXO 2022-08-30T15:01:05+00:00
Tons of Mirrors ☀️🛰️🪞🌒⚡
EXO 2022-09-08T19:24:08+00:00
What is it all for
Apperceptive by Sam 2022-09-09T19:21:29+00:00
Futurist prediction methods and accuracy
Dan Luu 2022-09-12T00:00:00+00:00
patreon / videos
Mighty Vision 2022-09-12T22:25:00+00:00
patreon / videos
Mighty Vision 2022-09-12T22:25:00+00:00
Project plans and Splitting a Setting in two
Evennia Devblog RSS Feed 2022-09-17T00:00:00+00:00
44: Jank with Jeaye Wilkerson
The REPL 2022-09-17T03:57:33+00:00
I’m a fool for not already being hyped for SpiderHeck ⊟
Tiny Cartridge 3DS 2022-09-22T15:44:32+00:00
It's not clear anybody wants autonomous cars
Apperceptive by Sam 2022-09-23T16:08:31+00:00
Compiler Optimizations Are Hard Because They Forget - Faultlore
Faultlore 2022-09-24T06:06:58+00:00
CCPS: A CPS-1 SDK
Fabien Sanglard 2022-09-25T00:00:00+00:00
The Book Of CP-System
Fabien Sanglard 2022-09-25T00:00:00+00:00
Large, Static Website hosting with AWS and Let's Encrypt managed with Terraform
Winny's Blog 2022-09-30T01:03:23+00:00
Chat log exhibits from Twitter v. Musk case
Dan Luu 2022-10-01T00:00:00+00:00
45: Data Rabbit with Ryan Robitaille
The REPL 2022-10-03T20:00:00+00:00
The story Waze tells about problems with autonomous cars
Apperceptive by Sam 2022-10-07T20:28:39+00:00
untitled
garfield minus garfield 2022-10-13T17:16:47+00:00
Big news for exactly me: new Arkanoid, from Pastagames
Tiny Cartridge 3DS 2022-10-27T16:36:33+00:00
Salvation E13S1 some code described as a never seen encryption...
Source Code in TV and Films 2022-11-05T13:21:15+00:00
Screenshot from The Boys, Season 3, Episode 8 showing part of...
Source Code in TV and Films 2022-11-05T13:21:28+00:00
From “The Silent Sea”, Season 1, Episode 7,...
Source Code in TV and Films 2022-11-05T13:21:43+00:00
From Upgrade (2018), python code with messed up indentation
Source Code in TV and Films 2022-11-05T13:21:57+00:00
I just wrote this up:...
Source Code in TV and Films 2022-11-05T13:23:06+00:00
Gif like a Pro
Simonschreibt. 2022-11-13T23:46:16+00:00
Money, Credit, Trust, and FTX
The Diff 2022-11-14T13:41:25+00:00
Terminate Software like a Pro
Simonschreibt. 2022-11-14T14:55:40+00:00
Using PureRef as Mini-Photoshop
Simonschreibt. 2022-11-14T15:10:03+00:00
Simon’s old VFX
Simonschreibt. 2022-11-14T15:47:51+00:00
The Book Of CP-System, paper version
Fabien Sanglard 2022-11-22T00:00:00+00:00
Castlevania (DOS - Hercules)
⌘+V 2022-11-25T10:54:00+00:00
Leafcutter ants and orchids (rotate)
⌘+V 2022-11-25T10:55:13+00:00
Medieval Manuscript Fragments in the Classroom
medievalbooks 2022-11-30T20:04:51+00:00
Mango Passion Fruit
Romain Laurent 2022-12-01T17:35:14+00:00
Evennia 1.0 released!
Evennia Devblog RSS Feed 2022-12-03T00:00:00+00:00
ulan-bator:
⌘+V 2022-12-05T07:48:18+00:00
"昔は全世界を一つにつなげるというのを理想にしていたんですが、最近は無理だと思うようになりました。人間、ゆるやかなフィルターバブルの中で生きるのが幸せなんじゃないかって。 ――それはなぜ……？ダンバ..."
⌘+V 2022-12-08T11:55:22+00:00
Transcript of Elon Musk on stage with Dave Chapelle
Dan Luu 2022-12-11T00:00:00+00:00
microstat
One Thing Well 2022-12-14T12:30:31+00:00
Books update
Fabien Sanglard 2022-12-15T00:00:00+00:00
shot-scraper
One Thing Well 2022-12-15T12:30:26+00:00
A Linux evening...
Fabien Sanglard 2022-12-16T00:00:00+00:00
Finicky
One Thing Well 2022-12-16T12:30:21+00:00
Bombadillo
One Thing Well 2022-12-17T12:30:20+00:00
s
One Thing Well 2022-12-18T12:30:25+00:00
SketchyBar
One Thing Well 2022-12-19T12:00:22+00:00
Progress on the Block Protocol
Joel on Software 2022-12-19T13:01:40+00:00
Happy Net Box
One Thing Well 2022-12-20T12:00:22+00:00
Rclone
One Thing Well 2022-12-21T12:00:22+00:00
Companion apps for Apple Music
One Thing Well 2022-12-22T12:00:19+00:00
podget
One Thing Well 2022-12-23T12:01:33+00:00
46: ClojureDart with Christophe Grand and Baptiste Dupuch
The REPL 2022-12-23T21:00:00+00:00
Osmosis S1E2, nanobot programming contains … Singleton...
Source Code in TV and Films 2022-12-24T15:34:39+00:00
Newsboat
One Thing Well 2022-12-29T12:00:19+00:00
Smol Pub
One Thing Well 2022-12-30T12:00:24+00:00
What Neovim shipped in 2022
Neovim 2022-12-31T00:00:00+00:00
Why Not Mars
Idle Words 2023-01-01T23:12:00+00:00
Type Checking If Expressions
journal.stuffwithstuff.com 2023-01-03T08:00:00+00:00
gum
One Thing Well 2023-01-06T13:05:26+00:00
New computer checklist
Winny's Blog 2023-01-09T06:00:00+00:00
nom
One Thing Well 2023-01-10T12:00:44+00:00
Heatwave
One Thing Well 2023-01-11T12:00:22+00:00
Trunk Updates 11 Jan 2023
Dungeon Crawl Stone Soup 2023-01-11T18:46:49+00:00
47: Executable textbooks with Sam Ritchie
The REPL 2023-01-12T14:20:00+00:00
Cilicon
One Thing Well 2023-01-12T15:35:33+00:00
yt-dlp
One Thing Well 2023-01-13T12:00:33+00:00
One Thing
One Thing Well 2023-01-18T12:00:44+00:00
Phi Chay Thai Cuisine – St. Paul, MN
You Care What We Think 2023-01-19T16:30:00+00:00
FrogFind
One Thing Well 2023-01-20T12:00:30+00:00
Mjolnir
Fabien Sanglard 2023-01-23T00:00:00+00:00
Linky
One Thing Well 2023-01-25T12:00:27+00:00
CGTC IV Talks, Day 2
Combinatorial Game Theory 2023-01-25T18:18:00+00:00
CGTC IV Talks, Day 3
Combinatorial Game Theory 2023-01-25T23:28:00+00:00
Quick Review: Nova Bar - Hudson, WI
You Care What We Think 2023-01-26T17:00:00+00:00
Wonder Boy, Bust a Move, New Zealand Story today⊟
Tiny Cartridge 3DS 2023-02-02T15:59:28+00:00
Croft Kitchen - Crosby, MN
You Care What We Think 2023-02-02T16:30:00+00:00
Can 4GiB meet your needs in 2023?
Winny's Blog 2023-02-07T00:00:00+00:00
Life Harvester 36: February 2023 as December 2021, Astrology as Personal Ads
Life Harvester 2023-02-08T18:36:41+00:00
Design for democracy - pro bono?
LESSIG Blog 2023-02-09T13:15:53+00:00
The Rib Co. - Twentynine Palms, CA
You Care What We Think 2023-02-09T17:00:00+00:00
Coming soon
Lefineder’s Substack 2023-02-12T18:08:38+00:00
Criminals ViewS Crime
Lefineder’s Substack 2023-02-13T00:27:25+00:00
Introducing BootFriend: unofficial custom firmware for WonderSwan Color
Posts on asie's blog 2023-02-15T23:15:00+00:00
LLMs, anthropocentric thinking, accuracy, and self-driving
Apperceptive by Sam 2023-02-16T15:01:42+00:00
Apostle Supper Club – St .Paul, MN
You Care What We Think 2023-02-16T17:00:00+00:00
Don't tase men, bro!
Lefineder’s Substack 2023-02-19T00:53:52+00:00
The Unreal Stencil Dragon
Simonschreibt. 2023-02-20T10:18:02+00:00
All you may need is HTML
Fabien Sanglard 2023-03-02T00:00:00+00:00
AI as UX
Apperceptive by Sam 2023-03-02T13:26:33+00:00
Status
Emily Short's Interactive Storytelling 2023-03-03T00:01:59+00:00
Pre-commit in GitHub Actions & GitLab CI
Winny's Blog 2023-03-09T10:00:00+00:00
Crime rise; more or the same.
Lefineder’s Substack 2023-03-10T23:05:31+00:00
Self-serving thought experiments
Apperceptive by Sam 2023-03-15T12:58:24+00:00
Old Chips, New Glitches: the CGA/CRTC "Phantom" VSync
int10h.org - VileR's blog 2023-03-21T07:18:31+00:00
Patriotism or Prestige
Lefineder’s Substack 2023-03-23T00:00:30+00:00
MyHouse.wad
Terry's Free Game of the Week 2023-03-24T07:37:35+00:00
100 win-streak by GJ
The Ground Gives Way 2023-03-26T14:16:06+00:00
Regular Home Renovation Simulator
Terry's Free Game of the Week 2023-03-30T03:51:52+00:00
The Joy of Computer History Books
Fabien Sanglard 2023-04-01T00:00:00+00:00
Trunk Updates 1 April 2023 and Tournament Announcement
Dungeon Crawl Stone Soup 2023-04-01T05:56:10+00:00
Sprouts 2023 Talks
Combinatorial Game Theory 2023-04-02T02:21:00+00:00
Magpie
Terry's Free Game of the Week 2023-04-06T13:34:10+00:00
The Father of Home Video Games
Ironic Sans 2023-04-11T15:55:49+00:00
Why Janet?
Ian Henry 2023-04-12T00:00:00+00:00
Knowing how to measure
Apperceptive by Sam 2023-04-14T12:16:09+00:00
Sylvie wasn’t feeling well today so she went on an adventure to meet different kittens
Terry's Free Game of the Week 2023-04-14T15:15:32+00:00
Generalized Macros
Ian Henry 2023-04-18T00:00:00+00:00
Joey Wamone’s Normal Bedtime Routine That Is Absolutely Not A Recurring Tooth Decay Nightmare
Terry's Free Game of the Week 2023-04-20T20:04:42+00:00
More CGA CRTC Glitching: HD6845(R) vs. MC6845
int10h.org - VileR's blog 2023-04-22T09:36:31+00:00
Darth Vader answers the Proust Questionnaire
Ironic Sans 2023-04-25T15:55:13+00:00
Battlefront II: Layered Explosion
Simonschreibt. 2023-04-25T19:39:38+00:00
TRAUMAKT~4.SEXE
Terry's Free Game of the Week 2023-04-28T08:51:28+00:00
The Four Vertex Volume
Simonschreibt. 2023-04-30T16:36:18+00:00
Driving Compilers
Fabien Sanglard 2023-05-03T00:00:00+00:00
Letters To The Editor
Fujichia 2023-05-03T14:52:18+00:00
Combat Mode
The Ground Gives Way 2023-05-03T17:56:35+00:00
Japanese Money Simulator
Terry's Free Game of the Week 2023-05-04T09:05:03+00:00
Deserve’s Got Nothing To Do With It
The Popehat Report 2023-05-05T14:54:27+00:00
0.30: “The Reavers Return”
Dungeon Crawl Stone Soup 2023-05-05T16:55:51+00:00
IREM game collections for me to collect ⊟
Tiny Cartridge 3DS 2023-05-05T19:05:40+00:00
Is artificial intelligence as a term per se racist?
Apperceptive by Sam 2023-05-05T19:37:19+00:00
The Divine Fire
Simonschreibt. 2023-05-05T22:34:32+00:00
Pokémon – Rapidash
Simonschreibt. 2023-05-07T20:05:37+00:00
It’s a Media Roundup!
Ironic Sans 2023-05-09T15:55:09+00:00
The wait for Gekisou! Benza Race - Toilet Shooting Star is almost over
Tiny Cartridge 3DS 2023-05-11T13:24:50+00:00
REALITY_ENDS S1
Terry's Free Game of the Week 2023-05-11T21:45:58+00:00
Zaga-33 reborn
Mighty Vision 2023-05-13T12:25:00+00:00
Zaga-33 reborn
Mighty Vision 2023-05-13T12:25:00+00:00
Green New Deal Simulator
Terry's Free Game of the Week 2023-05-18T17:52:51+00:00
Sillypaste migrated to fly.io
Winny's Blog 2023-05-19T00:30:00+00:00
0.30 Tournament Results
Dungeon Crawl Stone Soup 2023-05-22T04:02:15+00:00
Jedi: Fallen Order – Splishy Splashy
Simonschreibt. 2023-05-22T17:18:57+00:00
I Get No Mail and It’s Glorious
Ironic Sans 2023-05-23T15:55:49+00:00
Special Delivery
Terry's Free Game of the Week 2023-05-24T19:23:33+00:00
Inequality, galactic and planetary
Lefineder’s Substack 2023-05-26T00:49:26+00:00
Deus Ex – Alpha Terrain
Simonschreibt. 2023-05-28T12:36:14+00:00
Speech or Cancel Culture At Boston University?
The Popehat Report 2023-05-31T17:55:05+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2023-05-31T21:10:15+00:00
The Logic of Baseball
Ironic Sans 2023-06-06T15:55:54+00:00
My first game jam
Winny's Blog 2023-06-08T08:30:00+00:00
I’m coming around to the view that this is, in fact, The Greatest Witch Hunt of All Time. The other…
Squashed 2023-06-09T18:20:00+00:00
Evennia 2.0.0 released today
Evennia Devblog RSS Feed 2023-06-10T00:00:00+00:00
Jack Smith, Donald Trump, and the Kobayashi Maru
The Popehat Report 2023-06-10T18:54:28+00:00
From Danni Storm‘s exhibition W(ord)s & Weavings, which just finished in Copenhagen. Typed on a…
⌘+V 2023-06-11T02:31:04+00:00
GTA V – Underestimated Glow
Simonschreibt. 2023-06-11T19:47:24+00:00
That's Not How Recusal Works, That's Not How Any Of This Works!
The Popehat Report 2023-06-13T18:17:44+00:00
Anti-Personas
ignorethecode.net 2023-06-13T21:51:24+00:00
Thanks @screenrant
garfield minus garfield 2023-06-14T20:19:09+00:00
snjmrkm: (via スペースインベーダー45周年...
⌘+V 2023-06-15T01:59:07+00:00
Typewriter work by Dirk Krecker, 2014.
⌘+V 2023-06-15T01:59:29+00:00
garadinervi: Heinz Kroehl, Kroehl – Images, Landesmuseum Mainz,...
⌘+V 2023-06-15T01:59:57+00:00
text-mode: By Siggi Eggertsson, 2014.
⌘+V 2023-06-15T02:01:23+00:00
By Mark Webster.
⌘+V 2023-06-15T02:01:27+00:00
By Mark Webster.
⌘+V 2023-06-15T02:01:32+00:00
worldsofzzt: Source“Rotten Robots 2: Revenge of SID” by Caspar...
⌘+V 2023-06-15T02:01:49+00:00
Faun. PETSCII by Electric, 2023.
⌘+V 2023-06-15T02:01:53+00:00
moji: japanese matchbox labels
⌘+V 2023-06-15T02:02:15+00:00
worldsofzzt: Source“Castaway” by Unknown (1996) [CASTAWAY.ZZT]...
⌘+V 2023-06-15T02:02:50+00:00
worldsofzzt: Source“Myst Portal” by Chefchen HK...
⌘+V 2023-06-15T02:03:14+00:00
text-mode: Dirk Krecker
⌘+V 2023-06-15T02:03:28+00:00
Racket frustrates me
Winny's Blog 2023-06-16T08:30:00+00:00
Dead Man's Isle - Astoria, OR
You Care What We Think 2023-06-16T15:50:00+00:00
Good Vibrations
Fabien Sanglard 2023-06-17T00:00:00+00:00
Sinister strike
Lefineder’s Substack 2023-06-18T19:00:47+00:00
The Segway Inventor and His Comic Book Father
Ironic Sans 2023-06-20T15:55:09+00:00
Cannon Beach Hardware & Public House - Cannon Beach, OR
You Care What We Think 2023-06-20T16:49:00+00:00
Streak Redemption
ignorethecode.net 2023-06-26T20:30:44+00:00
Supreme Court Clarifies "True Threats" First Amendment Exception
The Popehat Report 2023-06-27T19:35:46+00:00
50 vogels – a project to print 50 birds each with 16×16 LEGO pieces. Made by Roy Scholten and…
⌘+V 2023-06-28T07:27:19+00:00
A Light Melancholy
Romain Laurent 2023-06-29T21:39:22+00:00
The Story of The First Software Patent
Ironic Sans 2023-07-04T15:55:58+00:00
My Kind of REPL
Ian Henry 2023-07-05T00:00:00+00:00
Raising the Bar for IBM PC/XT Emulation: MartyPC
int10h.org - VileR's blog 2023-07-05T14:37:21+00:00
> 177: Me claiming I could fix it
Laura Olin 2023-07-13T08:00:00+00:00
Tricking Monty Hall
ignorethecode.net 2023-07-15T11:20:38+00:00
The Finite Faculties of Man
Lefineder’s Substack 2023-07-16T16:11:35+00:00
10NES
Fabien Sanglard 2023-07-18T00:00:00+00:00
3D Gaming Before VR
Ironic Sans 2023-07-18T15:55:03+00:00
When did people stop being drunk all the time?
Lefineder’s Substack 2023-07-18T19:38:38+00:00
Carts of Carnage
Lefineder’s Substack 2023-07-19T19:18:08+00:00
"お兄ちゃんがガムを2つに割って、弟に「はい」と分けたりする、その行為が大事なんです。最初から分けてあったら、それはできませんから。"
⌘+V 2023-07-20T04:57:12+00:00
The Meat of Man the Hunter
Lefineder’s Substack 2023-07-21T23:59:03+00:00
Web Environment Integrity vs. Private Access Tokens - They're the same thing!
Juho Snellman's Weblog 2023-07-25T18:30:00+00:00
Helpful and unhelpful anthropomorphism
Apperceptive by Sam 2023-07-26T16:42:13+00:00
Hunter Biden And The Fog Of War
The Popehat Report 2023-07-26T21:04:34+00:00
Commander Keen: Adaptive Tile Scrolling
Fabien Sanglard 2023-07-27T00:00:00+00:00
Wonderful Toolchain project update - July 2023
Posts on asie's blog 2023-07-30T00:00:00+00:00
The Fibonacci Matrix
Ian Henry 2023-07-30T00:00:00+00:00
The value-destroying potential of AI
Apperceptive by Sam 2023-07-31T14:37:29+00:00
Rethinking Window Management
ignorethecode.net 2023-07-31T20:28:26+00:00
Is Following An Extradition Treaty An Elaborate Political Conspiracy?
The Popehat Report 2023-08-01T00:11:53+00:00
New Evidence in 100-Year-Old Claim of Amateurs Accomplishing What Experts Couldn’t
Ironic Sans 2023-08-01T15:55:54+00:00
People Are Lying To You About The Trump Indictment
The Popehat Report 2023-08-02T17:17:00+00:00
Representing Heterogeneous Data
journal.stuffwithstuff.com 2023-08-04T07:00:00+00:00
Nix / NixOS misconceptions
Winny's Blog 2023-08-06T05:00:00+00:00
Beware The Flood Of Trump Sentencing Disinformation
The Popehat Report 2023-08-06T21:31:29+00:00
The National Review Is Still Lying To You About The Fraud Charge Against Trump
The Popehat Report 2023-08-08T01:37:37+00:00
Vim Boss
Neovim 2023-08-09T00:00:00+00:00
Understanding (and) psychology
Apperceptive by Sam 2023-08-09T14:29:28+00:00
The Weight Of The Unspoken Word
The Popehat Report 2023-08-10T19:30:06+00:00
mDNS Primer
Fabien Sanglard 2023-08-11T00:00:00+00:00
Ode to the M1
Fabien Sanglard 2023-08-12T00:00:00+00:00
More Calories Less Crime
Lefineder’s Substack 2023-08-12T22:02:32+00:00
The Magician, The Artist & The Mathematician
Ironic Sans 2023-08-15T15:55:59+00:00
Overt Acts and Predicate Acts, Explained
The Popehat Report 2023-08-17T16:39:42+00:00
Browsing the web with a WonderSwan in 2023
Posts on asie's blog 2023-08-19T11:20:00+00:00
> 178: The footing is ambiguous
Laura Olin 2023-08-24T13:12:58+00:00
2023 Minnesota State Fair - St. Paul, MN
You Care What We Think 2023-08-25T02:08:00+00:00
Wonderful Toolchain project update - August 2023
Posts on asie's blog 2023-08-27T00:00:00+00:00
The Great Emu War
Ironic Sans 2023-08-29T15:55:52+00:00
End of August Links
Emily Short's Interactive Storytelling 2023-08-31T14:57:41+00:00
I have such a migraine
Ironic Sans 2023-09-12T15:55:11+00:00
48: Biff with Jacob O'Bryant
The REPL 2023-09-16T02:08:06+00:00
The cost we bear
Apperceptive by Sam 2023-09-22T15:31:29+00:00
Make Your Own Mon-Yu ⊟
Tiny Cartridge 3DS 2023-09-22T21:32:56+00:00
Exploring Command-line space time
Fabien Sanglard 2023-09-26T00:00:00+00:00
Where “Matrix Ping Pong” Came From
Ironic Sans 2023-09-26T15:55:10+00:00
> 179: The age of divestment
Laura Olin 2023-10-05T12:30:25+00:00
Tori (Ramen) - St. Paul, MN
You Care What We Think 2023-10-06T14:24:00+00:00
Forty years of programming
Fabien Sanglard 2023-10-08T00:00:00+00:00
I for one at looking forward to the release of that gender-flipped Rise and Fall of the Roman…
Squashed 2023-10-08T00:26:44+00:00
I Can’t Believe The Navy Gave Me So Much Access
Ironic Sans 2023-10-10T15:55:09+00:00
NeXt...
Source Code in TV and Films 2023-10-11T16:02:29+00:00
NeXt (2020 series) I know I already shared something from this...
Source Code in TV and Films 2023-10-11T16:02:46+00:00
Big Duck Energy
Wild Information 2023-10-15T14:01:17+00:00
Does Go Have Subtyping?
journal.stuffwithstuff.com 2023-10-19T07:00:00+00:00
Astro-Dodge's Dirty Video Tricks
int10h.org - VileR's blog 2023-10-20T17:35:24+00:00
Why I’m Still Not Sick of ChatGPT
Ironic Sans 2023-10-24T15:55:16+00:00
Sway review
Winny's Blog 2023-10-26T05:00:00+00:00
To Boldly Go Down The Hall
Wild Information 2023-10-26T22:14:01+00:00
I knew this was coming
Apperceptive by Sam 2023-10-27T16:29:26+00:00
Why Do Peephole Optimizations Work?
Embedded in Academia 2023-11-01T16:23:20+00:00
> 180: You want to see my hands?
Laura Olin 2023-11-02T11:41:40+00:00
How the field of "AI" got like this
Apperceptive by Sam 2023-11-02T14:40:06+00:00
0x4 reasons to write and publish
Fabien Sanglard 2023-11-07T00:00:00+00:00
A Celebrity in Every Taxi
Ironic Sans 2023-11-07T16:55:17+00:00
The bash book to rule them all
Fabien Sanglard 2023-11-08T00:00:00+00:00
Fear of Trees
Wild Information 2023-11-12T16:00:31+00:00
Moving from Team17
GAMEPOPPER 2023-11-13T16:10:12+00:00
My Free Speech Means You Have To Shut Up
The Popehat Report 2023-11-20T03:11:56+00:00
Have You Heard About Montana?!
Ironic Sans 2023-11-21T16:55:33+00:00
In Which I Repent On Free Speech Culture
The Popehat Report 2023-11-22T01:42:44+00:00
Dialogue Expressiveness in Mask of the Rose
Emily Short's Interactive Storytelling 2023-11-22T17:51:48+00:00
How Apple's Pro Display XDR takes Thunderbolt 3 to its limit
Fabien Sanglard 2023-11-23T00:00:00+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:25:09+00:00
“Upon seeing Daniel Root’s photographs of Manhattan’s downtown bars, I was immediately taken by the…
EV NY: 30 yrs and now 2023-11-26T13:28:41+00:00
“Daniel Root’s photos of New York bars at dawn are a perfect blend of beauty and melancholia….
EV NY: 30 yrs and now 2023-11-26T13:29:33+00:00
“I’ve always said the very best bars are inviting, whether packed or empty. Daniel Root’s amazing…
EV NY: 30 yrs and now 2023-11-26T13:30:48+00:00
Forty years in NYC, in those very neighborhoods, having seen countless NYC pictures and yet here was…
EV NY: 30 yrs and now 2023-11-26T13:32:00+00:00
“New York bars at Dawn”
EV NY: 30 yrs and now 2023-11-26T13:34:51+00:00
“New York Bars at Dawn”
EV NY: 30 yrs and now 2023-11-26T13:36:36+00:00
“New York Bars at Dawn”
EV NY: 30 yrs and now 2023-11-26T13:38:36+00:00
“New York Bars at Dawn”
EV NY: 30 yrs and now 2023-11-26T13:39:52+00:00
“New York Bars at Dawn”
EV NY: 30 yrs and now 2023-11-26T13:41:32+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:53:10+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:53:49+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:54:26+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:54:55+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:55:30+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:56:21+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:57:04+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:57:43+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:58:38+00:00
untitled
EV NY: 30 yrs and now 2023-11-26T13:59:27+00:00
Punishment Envy And The Perils Of Institutional Engagement
The Popehat Report 2023-11-28T22:32:42+00:00
I Fight For The Users
Coding Horror 2023-11-30T20:11:05+00:00
Living in a Lucid Dream
Wild Information 2023-11-30T23:14:52+00:00
Suit Viewing Opportunities
Fujichia 2023-12-05T15:21:03+00:00
Gift Guide For Fictional Characters
Ironic Sans 2023-12-05T16:55:21+00:00
Stop Demanding Dumb Answers To Hard Questions
The Popehat Report 2023-12-07T17:36:36+00:00
The emptiness at the heart of emotion recognition
Apperceptive by Sam 2023-12-08T13:41:24+00:00
Trunk Updates 11 December 2023 and Tournament Announcement
Dungeon Crawl Stone Soup 2023-12-11T20:05:28+00:00
Cup of Coffee: December 18, 2023
Cup of Coffee by Craig Calcaterra 2023-12-18T11:10:18+00:00
Cup of Coffee: December 19, 2023
Cup of Coffee by Craig Calcaterra 2023-12-19T11:10:34+00:00
CERN
ntoll.org 2023-12-19T13:30:00+00:00
How Are You? Just Give Me Your Stock Answer.
Ironic Sans 2023-12-19T16:55:39+00:00
Cup of Coffee: December 20, 2023
Cup of Coffee by Craig Calcaterra 2023-12-20T11:10:49+00:00
My 2023 in review
Winny's Blog 2023-12-21T06:00:00+00:00
Cup of Coffee: December 21, 2023
Cup of Coffee by Craig Calcaterra 2023-12-21T11:10:34+00:00
UK tv series called COBRA: Cyberwar staring Robert Carlyle - I...
Source Code in TV and Films 2023-12-21T18:30:35+00:00
Didn’t know if this blog was still a thing, but here it is!...
Source Code in TV and Films 2023-12-21T18:30:38+00:00
Substack's response to Substackers against Nazis sucks
Cup of Coffee by Craig Calcaterra 2023-12-21T19:37:19+00:00
Substack Has A Nazi Opportunity
The Popehat Report 2023-12-21T20:25:33+00:00
Cup of Coffee: December 22, 2023
Cup of Coffee by Craig Calcaterra 2023-12-22T11:10:31+00:00
Cup of Coffee: Merry Christmas!
Cup of Coffee by Craig Calcaterra 2023-12-25T11:41:38+00:00
Cup of Coffee: December 27, 2023
Cup of Coffee by Craig Calcaterra 2023-12-27T11:10:26+00:00
Cup of Coffee: December 28, 2023
Cup of Coffee by Craig Calcaterra 2023-12-28T11:10:54+00:00
May A Public University Fire Its Chancellor For Appearing In Porn Videos On His Own Time?
The Popehat Report 2023-12-28T23:38:26+00:00
Cup of Coffee: December 29, 2023
Cup of Coffee by Craig Calcaterra 2023-12-29T11:10:19+00:00
49: Clerk with Martin Kavalar
The REPL 2023-12-29T21:23:50+00:00
How bad are search results? Let's compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT
Dan Luu 2023-12-30T00:00:00+00:00
Why Android developers no longer need Windows USB drivers
Fabien Sanglard 2023-12-30T00:00:00+00:00
The Time of Big Walking
Wild Information 2023-12-31T21:21:38+00:00
Full UI Upscaling, Part 1: History and Theory
Grid Sage Games 2024-01-02T04:04:36+00:00
Cup of Coffee: January 2, 2024
Cup of Coffee by Craig Calcaterra 2024-01-02T11:10:30+00:00
Upgrading my Workstation to NixOS 23.11
Winny's Blog 2024-01-03T06:00:00+00:00
Cup of Coffee: January 3, 2024
Cup of Coffee by Craig Calcaterra 2024-01-03T11:10:08+00:00
Cup of Coffee: January 4, 2024
Cup of Coffee by Craig Calcaterra 2024-01-04T11:10:47+00:00
Cup of Coffee: January 5, 2024
Cup of Coffee by Craig Calcaterra 2024-01-05T11:10:30+00:00
Full UI Upscaling, Part 2: Holy Mockups!
Grid Sage Games 2024-01-05T13:50:40+00:00
How LLMs are and are not like the brain
Apperceptive by Sam 2024-01-05T16:57:51+00:00
Heraclitus: The Unity of Opposites
ntoll.org 2024-01-07T11:30:00+00:00
Cup of Coffee: January 8, 2024
Cup of Coffee by Craig Calcaterra 2024-01-08T11:10:21+00:00
Moving to Ghost 👻
Ironic Sans 2024-01-08T19:41:06+00:00
Cup of Coffee: January 9, 2024
Cup of Coffee by Craig Calcaterra 2024-01-09T11:10:23+00:00
Multiple arguments in shebang
Winny's Blog 2024-01-10T06:00:00+00:00
Cup of Coffee: January 10, 2024
Cup of Coffee by Craig Calcaterra 2024-01-10T11:11:00+00:00
Cup of Coffee: January 11, 2024
Cup of Coffee by Craig Calcaterra 2024-01-11T11:10:16+00:00
> 181: It has taken all our strength
Laura Olin 2024-01-11T12:41:25+00:00
Full UI Upscaling, Part 3: Dynamic Terminal Swapping
Grid Sage Games 2024-01-12T06:31:24+00:00
Cup of Coffee: January 12, 2024
Cup of Coffee by Craig Calcaterra 2024-01-12T11:10:25+00:00
Win A Dream Date With A Litigious Douchebag!
The Popehat Report 2024-01-12T17:27:24+00:00
Cup of Coffee has moved to Beehiiv
Cup of Coffee by Craig Calcaterra 2024-01-14T22:02:26+00:00
Another NixOS 23.11 upgrade gotcha
Winny's Blog 2024-01-15T06:00:00+00:00
0.31 Tournament Page
Dungeon Crawl Stone Soup 2024-01-17T03:32:35+00:00
Full UI Upscaling, Part 4: Simpler Lightweight Fonts
Grid Sage Games 2024-01-19T05:34:15+00:00
0.31 “The Alchemy of Forms”
Dungeon Crawl Stone Soup 2024-01-19T06:36:33+00:00
Destructive investing and the siren song of software
Apperceptive by Sam 2024-01-19T21:07:06+00:00
How the DevTeam conquered the iPhone
Fabien Sanglard 2024-01-21T00:00:00+00:00
Games at Mumbai, Day 0 Talks
Combinatorial Game Theory 2024-01-22T04:02:00+00:00
Games at Mumbai, Day 1 Talks
Combinatorial Game Theory 2024-01-22T14:15:00+00:00
Games at Mumbai, Day 2 Talks
Combinatorial Game Theory 2024-01-23T12:51:00+00:00
Games at Mumbai, Day 3 Talks
Combinatorial Game Theory 2024-01-24T13:20:00+00:00
Why do people post on [bad platform] instead of [good platform]?
Dan Luu 2024-01-25T00:00:00+00:00
Games at Mumbai, Day 4 (Final) of Talks
Combinatorial Game Theory 2024-01-25T11:41:00+00:00
Games at Mumbai: Not the Talks
Combinatorial Game Theory 2024-01-25T12:44:00+00:00
How to Learn Nix, Part 48: Installing (single-user) Nix on macOS
Ian Henry 2024-01-26T00:00:00+00:00
Test your backups
Winny's Blog 2024-01-27T06:00:00+00:00
How to Learn Nix, Part 49: nix-direnv is a huge quality of life improvement
Ian Henry 2024-01-28T00:00:00+00:00
Notes on Cruise's pedestrian accident
Dan Luu 2024-01-29T00:00:00+00:00
The Popehat Report Is Moving To Beehiiv
The Popehat Report 2024-01-30T22:32:46+00:00
Would the Buddha Wear a Walkman?
Wild Information 2024-02-02T18:38:01+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2024-02-04T15:44:51+00:00
I haven’t posted anything on here in years, but I thought it’d be funny to just drop this lil Claude…
Zac Gorman 2024-02-05T17:20:30+00:00
Why it's impossible to agree on what's allowed
Dan Luu 2024-02-07T00:00:00+00:00
Why those "training data poisoning" gimmicks don't really work
Apperceptive by Sam 2024-02-09T13:42:58+00:00
Adventures in Map Zooming, Part 5: QoL
Grid Sage Games 2024-02-11T08:30:03+00:00
Game Font Forensics
int10h.org - VileR's blog 2024-02-11T20:26:57+00:00
0.31 Tournament Results
Dungeon Crawl Stone Soup 2024-02-12T05:02:19+00:00
Diseconomies of scale in fraud, spam, support, and moderation
Dan Luu 2024-02-18T00:00:00+00:00
Brighter Than a Cloud
Wild Information 2024-02-18T17:21:57+00:00
Full UI Upscaling, Part 5: Completion and Demos
Grid Sage Games 2024-02-23T03:25:55+00:00
50: Peter Taoussanis
The REPL 2024-02-27T08:00:00+00:00
51: Building a text editor with Nate Hunzaker
The REPL 2024-03-05T01:00:00+00:00
Tiger Unlimited
Fujichia 2024-03-09T22:15:24+00:00
How web bloat impacts users with slow devices
Dan Luu 2024-03-16T00:00:00+00:00
Steal These Surface Duo Ideas
ignorethecode.net 2024-03-16T10:32:53+00:00
When Animals Dream
Wild Information 2024-03-17T15:30:46+00:00
Cozy Space Survivors
Simonschreibt. 2024-03-19T18:34:14+00:00
I.F.O. (Identified Flying Object) 81987) atari basic source...
Source Code in TV and Films 2024-03-24T17:14:01+00:00
JIM & KERRY
Infinite Gossip 2024-03-25T05:15:40+00:00
> 182: Do you trust me? Do I trust you?
Laura Olin 2024-03-28T08:00:00+00:00
The hearts of the Super Nintendo
Fabien Sanglard 2024-04-01T00:00:00+00:00
The Victorian Python Community (an Allegory)
ntoll.org 2024-04-01T05:30:00+00:00
thunder cracks, mysterious rattling sounds ⊟
Tiny Cartridge 3DS 2024-04-04T14:30:03+00:00
CDI is Now Official and CKO is Going Offline
Dungeon Crawl Stone Soup 2024-04-07T01:05:34+00:00
The evolution of the Super Nintendo motherboard
Fabien Sanglard 2024-04-08T00:00:00+00:00
I DISCOVERED A NEW FRUIT
Infinite Gossip 2024-04-09T06:14:44+00:00
Daylight Saving Time
ignorethecode.net 2024-04-14T10:25:05+00:00
52: Coding in YAML with Ingy döt Net
The REPL 2024-04-14T19:00:00+00:00
Update Preview: Blood in the Water
Barotrauma 2024-04-19T15:19:06+00:00
Dataflow Analyses and Compiler Optimizations that Use Them, for Free
Embedded in Academia 2024-04-20T21:55:33+00:00
Inside the Super Nintendo cartridges
Fabien Sanglard 2024-04-21T00:00:00+00:00
53: Clojure LSP with Eric Dallo
The REPL 2024-04-21T20:33:31+00:00
Sprouts 2024 Talks
Combinatorial Game Theory 2024-04-23T20:29:00+00:00
The Pacification of War
Lefineder’s Substack 2024-04-23T22:34:28+00:00
Live Lone and Prosper
Lefineder’s Substack 2024-04-26T18:30:59+00:00
A Forest from the Moon
Wild Information 2024-04-28T17:09:34+00:00
> 183: He stole forsythia.
Laura Olin 2024-05-02T12:00:00+00:00
paste.winny.tech (Sillypaste) is dead
Winny's Blog 2024-05-04T05:00:00+00:00
Pair Your Compilers At The ABI Café - Faultlore
Faultlore 2024-05-05T00:00:00+00:00
LET ME TELL YOU ABOUT MY SAAB VIII
Infinite Gossip 2024-05-07T03:07:31+00:00
Hyperlink Island
Wild Information 2024-05-12T14:00:59+00:00
What Waymo's NHTSA investigation says about how far along autonomous cars are
Apperceptive by Sam 2024-05-15T15:45:20+00:00
Neovim 0.10
Neovim 2024-05-16T00:00:00+00:00
New Guy Alert
Fujichia 2024-05-16T17:20:07+00:00
THE LOCAL GHOSTS
Infinite Gossip 2024-05-20T22:45:07+00:00
The Lunacy of Artemis
Idle Words 2024-05-24T10:12:00+00:00
What the FTC got wrong in the Google antitrust investigation
Dan Luu 2024-05-26T00:00:00+00:00
HOW DO WE KILL CHILDREN
Infinite Gossip 2024-05-28T07:14:50+00:00
> 184: We love what we have, no matter how little
Laura Olin 2024-05-30T12:00:00+00:00
Supervision and truth
Apperceptive by Sam 2024-05-31T16:23:09+00:00
On Paying Attention
ntoll.org 2024-06-01T12:00:00+00:00
Pleasant Realms
Fujichia 2024-06-06T14:22:46+00:00
Preview: Summer Update 2024
Barotrauma 2024-06-07T15:36:34+00:00
The Inner Space Race
Wild Information 2024-06-09T15:12:24+00:00
Other Worlds Zine Fair - Marrickville 23/6
Infinite Gossip 2024-06-14T03:56:05+00:00
A discussion of discussions on AI bias
Dan Luu 2024-06-16T00:00:00+00:00
> 185: Run them through butter
Laura Olin 2024-06-27T12:00:00+00:00
Revisiting Number Theory and the Impossible Puzzle
a blog by biggiemac42 2024-06-29T06:52:02+00:00
The Queen's Doll's House
Wild Information 2024-07-02T19:23:42+00:00
TOI-700
Infinite Gossip 2024-07-05T04:20:04+00:00
Institute for Controlled Speleogenesis
BLDGBLOG 2024-07-08T02:18:04+00:00
Fireside Chat: Founders Inc
Vjeux 2024-07-14T02:42:08+00:00
Podcast: Software Engineering Daily
Vjeux 2024-07-14T02:45:11+00:00
Podcast: devtoolsFM
Vjeux 2024-07-14T02:46:25+00:00
Podcast: Coder pour changer une vie
Vjeux 2024-07-14T02:47:53+00:00
Podcast: Changelog
Vjeux 2024-07-14T02:49:30+00:00
Panel on Layout Performance – EdgeConf 4
Vjeux 2024-07-14T02:52:08+00:00
React Documentary
Vjeux 2024-07-14T02:54:19+00:00
CPUID instruction and table
Winny's Blog 2024-07-15T05:00:00+00:00
New Crawl Servers and a Possible Server Retirement
Dungeon Crawl Stone Soup 2024-07-20T00:57:05+00:00
Dollar Country Newsletter, July 2024
Dollar Country Newsletter & Radio Show 2024-07-22T00:30:54+00:00
Update the NAS to 24.05
Winny's Blog 2024-07-23T05:00:00+00:00
> 186: Synonyms haunted. Synonyms meaningful.
Laura Olin 2024-07-25T13:17:45+00:00
Episode 248 & 249: Ol' Bertha Needs A Big Al / I'm Out On The Town
Dollar Country Newsletter & Radio Show 2024-07-26T20:46:45+00:00
Carving the Super Nintendo Video System
Fabien Sanglard 2024-07-29T00:00:00+00:00
ONE EYE
Infinite Gossip 2024-07-30T06:58:41+00:00
Tintype of a handsome dandy with fabulous hair, c. 1860s
dead gorgeous 2024-07-31T00:39:49+00:00
The Colony Makes The World
Wild Information 2024-08-04T20:01:11+00:00
Daguerreotype of a tough guy missing an eye, half a pinky—and, perhaps, the subjects of the group…
dead gorgeous 2024-08-04T23:00:57+00:00
Daguerreotype of a stylish young swell possessed of the fine features and lofty brow that bring all…
dead gorgeous 2024-08-05T02:23:36+00:00
Carte de visite of strapping Swedish naval officer Jarl Christiersson, c. 1860
dead gorgeous 2024-08-05T16:04:38+00:00
Name the Non-Standard PC Code Page
int10h.org - VileR's blog 2024-08-07T08:43:47+00:00
The Curious Case of Col's Computational Complexity
Combinatorial Game Theory 2024-08-07T17:25:00+00:00
Gijs Gieskes "Zonnepanneel 2"
Pleasant Realms 2024-08-07T21:32:44+00:00
SNES: Sprites and backgrounds rendering
Fabien Sanglard 2024-08-09T00:00:00+00:00
How the SNES Graphics System works
Fabien Sanglard 2024-08-09T00:00:00+00:00
How good can you be at Codenames without knowing any words?
Dan Luu 2024-08-11T00:00:00+00:00
Quote-unquote "macros"
Ian Henry 2024-08-12T00:00:00+00:00
Kushkuli Box Competition
Pleasant Realms 2024-08-13T13:31:20+00:00
MY CONTACT AT THE RAT FACTORY
Infinite Gossip 2024-08-15T05:06:54+00:00
Watching sunsets
Fabien Sanglard 2024-08-18T00:00:00+00:00
Bake Notes 2024.08.17
Maybe Pizza? 2024-08-18T18:38:26+00:00
0.32 Release and Tournament
Dungeon Crawl Stone Soup 2024-08-19T03:34:02+00:00
Haunted Mansion Lights On
Pleasant Realms 2024-08-20T13:06:02+00:00
Magic, Modified
Demon 2024-08-21T17:08:02+00:00
2024 Minnesota State Fair - Falcon Heights, MN
You Care What We Think 2024-08-23T00:38:00+00:00
Late summer greetings
Barotrauma 2024-08-23T14:21:09+00:00
Dollar Country Newsletter, August 2024
Dollar Country Newsletter & Radio Show 2024-08-25T12:00:28+00:00
Carte de visite of three British soldiers on beer break, c. 1860s
dead gorgeous 2024-08-27T16:03:51+00:00
Absolutely stunning post-mortem daguerreotype of a young man with killer cheekbones and haunting…
dead gorgeous 2024-08-27T22:01:53+00:00
2024 Minnesota State Fair (Take 2) – Falcon Heights, MN
You Care What We Think 2024-08-28T14:13:00+00:00
Ambrotype of a jovial gent who won’t allow an injured arm to cramp his style, c. 1860s
dead gorgeous 2024-08-28T16:02:47+00:00
CHLOE, 21 FROM STOCKPORT
Infinite Gossip 2024-08-28T23:41:30+00:00
Beethoven “Moonlight Sonata” for Old Elephant
Pleasant Realms 2024-08-29T18:33:11+00:00
0.32 “Gods and Makers”
Dungeon Crawl Stone Soup 2024-08-29T22:20:16+00:00
Ambrotype of two boxers about to engage, c. 1850s
dead gorgeous 2024-08-30T06:28:41+00:00
Ambrotype of a pipe-puffing pair of comrades in arms, c. 1861-65
dead gorgeous 2024-08-30T21:42:21+00:00
Postcard of a very refined young man reading a letter with the precise degree of drama appropriate…
dead gorgeous 2024-08-31T13:24:21+00:00
Cabinet card of “BEAUTY,” the Male Chick-Rearing Cat, 1889. As the back of the card enthuses:
dead gorgeous 2024-08-31T18:02:07+00:00
I hope you use ShellCheck
Winny's Blog 2024-09-01T05:00:00+00:00
Northbound Smokehouse & Brewpub - Minneapolis, MN
You Care What We Think 2024-09-01T16:30:00+00:00
Stereoview of a hussar and his sweetheart, or perhaps ex-sweetheart, c. 1850s
dead gorgeous 2024-09-01T23:21:26+00:00
Multitile Actors, Revisited
Grid Sage Games 2024-09-03T08:00:05+00:00
Stereoscopic daguerreotype of two men playing chess in front of a mirror, c. 1840s
dead gorgeous 2024-09-03T17:34:03+00:00
Daguerreotype of a gentleman with hard, hawkish eyes and a prim little kitten bow, c. 1840s
dead gorgeous 2024-09-04T17:37:57+00:00
Looking for Missed Alarm Bugs in a Formal Verification Tool
Embedded in Academia 2024-09-04T18:29:03+00:00
> 187: Colours dull with injustice etc.,
Laura Olin 2024-09-05T13:47:52+00:00
How To Turn A Sphere Inside Out
Pleasant Realms 2024-09-05T15:43:11+00:00
Daguerreotype of a gentleman with an artfully tied blue silk cravat, c. 1840s
dead gorgeous 2024-09-05T19:10:40+00:00
How Long Does a Grain Revolution Take?
Maybe Pizza? 2024-09-06T21:47:14+00:00
Clyde's Drive-In - Manistique, MI
You Care What We Think 2024-09-06T22:02:00+00:00
Shellcheck and Emacs
Winny's Blog 2024-09-08T05:00:00+00:00
Detail from a cabinet card of two officers with their arms entwined, their swords well-hung, and…
dead gorgeous 2024-09-08T18:29:32+00:00
Pizza Roundup 2024.09
Maybe Pizza? 2024-09-09T16:32:19+00:00
Carte de visite of a serenely self-assured young naval officer identified on reverse as Octave…
dead gorgeous 2024-09-10T01:10:55+00:00
THE BALLAD OF THE HOLLYWOOD CASTING DIRECTOR WHO SELECTS THE FIRST MANNED MISSION TO MARS
Infinite Gossip 2024-09-10T23:10:31+00:00
Sawmill Pizza and Brew Shed - Clear Lake, WI
You Care What We Think 2024-09-12T16:00:00+00:00
Everything Is Everything, by Koki Tanaka
Pleasant Realms 2024-09-12T16:29:26+00:00
I Broke It
nklein software 2024-09-13T15:46:01+00:00
Carte de visite of a richly attired Hungarian aristocrat resplendent in fur-trimmed cape and hessian…
dead gorgeous 2024-09-13T23:51:47+00:00
From a recovering former Python community member
ntoll.org 2024-09-16T17:00:00+00:00
54: JRuby with Charles Oliver Nutter
The REPL 2024-09-17T01:00:00+00:00
🦀 Four Thousand Weeks in Rust
Nathan Youngman 2024-09-18T00:00:00+00:00
Round Man Brewing Company - Spooner, WI
You Care What We Think 2024-09-18T16:00:00+00:00
Sneak peek: Alien ruin and husk improvements
Barotrauma 2024-09-20T15:14:52+00:00
Tom Vincent's Vincenzo's Pizzeria
Maybe Pizza? 2024-09-20T21:59:22+00:00
Reaching Long-Awaited Perfection on Opus Magnum’s Final Level
a blog by biggiemac42 2024-09-22T06:54:11+00:00
Being a Tech Art Detective
Simonschreibt. 2024-09-23T13:09:47+00:00
Laka Lono Rum Club - Omaha, NE
You Care What We Think 2024-09-24T17:00:00+00:00
0.32 Tournament Results
Dungeon Crawl Stone Soup 2024-09-25T02:32:37+00:00
I BUILT A TIME MACHINE
Infinite Gossip 2024-09-26T04:49:40+00:00
Separating Litharge at Top Speed
a blog by biggiemac42 2024-09-26T08:35:08+00:00
Steve Reich's "Clapping Music" on the el green line
Pleasant Realms 2024-09-26T21:25:28+00:00
Ray Tracing In One Weekend (in Lisp, and n-dimenions)
nklein software 2024-09-27T02:37:31+00:00
Bake Notes 2024.09.28: Oven Experiments With Bread
Maybe Pizza? 2024-09-29T00:24:44+00:00
Bake Notes 2024.09.29: Oven Experiments With Bread Part 2
Maybe Pizza? 2024-09-29T17:18:09+00:00
What's a Brain?
Wild Information 2024-09-29T21:08:05+00:00
The Rise of Kamikaze: Why Japan Turned to Suicide Attacks in WWII
Steelsnowflake 2024-09-30T09:04:46+00:00
Bake Notes 2024.09.30: Oven Experiments With Bread Part 3
Maybe Pizza? 2024-09-30T22:00:15+00:00
Ponder This Challenge - October 2024 - Splitting a number
IBM Ponder This 2024-10-01T00:00:00+00:00
Carte de visite of two fine fellows rocking like it’s 1892 (per date on reverse), with their little…
dead gorgeous 2024-10-01T03:36:13+00:00
Daguerreotype of a pair of bow-tied beaus, c. 1840s
dead gorgeous 2024-10-02T03:12:11+00:00
Bake Notes 2024.10.02: Oven Experiments With Bread Part 4
Maybe Pizza? 2024-10-03T04:11:33+00:00
> 188: safe through the generous fields
Laura Olin 2024-10-03T12:52:52+00:00
Sneak peek: PvP Overhaul
Barotrauma 2024-10-04T13:46:22+00:00
Bake Notes 2024.10.05: Oven Experiments With Bread Part 5 and 6
Maybe Pizza? 2024-10-05T18:27:45+00:00
Missing IBM PC Localization Disks & ROMs
int10h.org - VileR's blog 2024-10-06T20:59:38+00:00
Bake Notes 2024.10.07: Oven Experiments With Bread Part 7 - Success
Maybe Pizza? 2024-10-08T04:20:36+00:00
Bake Notes 2024.10.08: Oven Experiments With Bread Part 8
Maybe Pizza? 2024-10-08T21:00:44+00:00
Celebrating New Achievements in NES Tetris
a blog by biggiemac42 2024-10-09T07:01:44+00:00
Isopod Terrarium
Pleasant Realms 2024-10-11T16:19:45+00:00
Bake Notes 2024.10.10: Oven Experiments With Bread Part 9
Maybe Pizza? 2024-10-11T20:09:38+00:00
Dollar Country Newsletter, October 2024
Dollar Country Newsletter & Radio Show 2024-10-13T12:03:19+00:00
The Hidden Bird Algorithm
Wild Information 2024-10-13T15:01:56+00:00
Pizza / Bread Roundup 002
Maybe Pizza? 2024-10-13T19:51:22+00:00
Bake Notes 2024.10.14 Oven Experiments With Bread Part 10
Maybe Pizza? 2024-10-14T22:22:56+00:00
Wonderful Toolchain project update - October 2024
Posts on asie's blog 2024-10-17T00:00:00+00:00
BAR
Infinite Gossip 2024-10-18T02:26:55+00:00
55: Instant: a modern Firebase in Clojure, with Stepan Parunashvili
The REPL 2024-10-18T06:00:32+00:00
Coming next week: Unto the Breach update
Barotrauma 2024-10-18T15:57:40+00:00
Sade Parking Lot
Pleasant Realms 2024-10-18T16:43:29+00:00
Cattle Grazing Is Not the Answer to Climate Change
Steelsnowflake 2024-10-19T11:01:40+00:00
The Empyrean’s New Clothes
Demon 2024-10-19T19:12:01+00:00
Bake Notes 2024.10.21: Oven Experiments With Bread Part 13
Maybe Pizza? 2024-10-21T20:37:35+00:00
Rainbow Gray
Steelsnowflake 2024-10-24T15:01:54+00:00
Steve Ballmer was an underrated CEO
Dan Luu 2024-10-28T00:00:00+00:00
Chilean Sea Bass
The Curiosity Cabinet 2024-10-28T02:37:03+00:00
Bach To The Future - Toccata and Fugue in D minor
Pleasant Realms 2024-10-29T16:23:21+00:00
THE WAR CRIMINALS AFTER THE WAR
Infinite Gossip 2024-10-30T21:43:38+00:00
Ponder This Challenge - November 2024 - Tetrahedron volumes
IBM Ponder This 2024-11-01T00:00:00+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2024-11-03T22:54:21+00:00
The Secret Ballot
The Curiosity Cabinet 2024-11-05T17:08:25+00:00
The Paradox of Progress: Mencken on Democracy
Steelsnowflake 2024-11-06T19:29:39+00:00
> 189: AIN' EVEN BEEN PLANTED YET
Laura Olin 2024-11-07T14:04:47+00:00
untitled
Terrible Banana 2024-11-11T01:31:08+00:00
A software controlled power supply for $25
The Grymoire 2024-11-11T18:31:50+00:00
Dollar Country Newsletter, November 2024
Dollar Country Newsletter & Radio Show 2024-11-11T23:56:51+00:00
Midwestern Luxury = Black Walnut Cake
Midwesterner 2024-11-13T13:02:59+00:00
THE SWIMMERS
Infinite Gossip 2024-11-13T21:38:36+00:00
Submarine Highlights (live right now!!!!)
Pleasant Realms 2024-11-14T16:24:21+00:00
Regarding the future of BlocksDS
Posts on asie's blog 2024-11-15T17:30:00+00:00
Rivers of Blood
Demon 2024-11-16T05:21:24+00:00
Episodes 250 & 251: Ten Mile Zone / It's Not Much But It's Home
Dollar Country Newsletter & Radio Show 2024-11-17T14:02:16+00:00
Sitters and Standers
The Pudding 2024-11-19T06:00:00+00:00
Hanky Pankies for the Holidays
Midwesterner 2024-11-20T13:00:49+00:00
Pizza Roundup 003
Maybe Pizza? 2024-11-20T17:40:12+00:00
crowfunding: 868-BACK
Mighty Vision 2024-11-21T14:49:00+00:00
crowfunding: 868-BACK
Mighty Vision 2024-11-21T14:49:00+00:00
untitled
Terrible Banana 2024-11-25T04:00:31+00:00
Perfect imperfection
medievalbooks 2024-11-25T11:15:34+00:00
Beastly beginnings
medievalbooks 2024-11-26T10:45:30+00:00
CROW FUN
Mighty Vision 2024-11-28T12:54:00+00:00
CROW FUN
Mighty Vision 2024-11-28T12:54:00+00:00
A COLONY OF MAGGOTS BUILDING A FOX THROUGH CAREFUL EFFORT
Infinite Gossip 2024-11-28T21:25:28+00:00
Green Bean Casserole
The Curiosity Cabinet 2024-11-29T01:20:43+00:00
Ray Tracing Extra-dimensional CSG Objects
nklein software 2024-11-30T15:43:17+00:00
Ponder This Challenge - December 2024 - Counting numbers with specific digits
IBM Ponder This 2024-12-01T00:00:00+00:00
Train Driver Record Hanoi to Ninh Bahn
Pleasant Realms 2024-12-02T14:11:03+00:00
The Great Filter Comes For Us All
Coding Horror 2024-12-02T18:25:46+00:00
The 2024 Midwesterner Gift Guide
Midwesterner 2024-12-05T13:02:59+00:00
Holiday greetings and update preview
Barotrauma 2024-12-05T14:34:30+00:00
> 190: What are you trying to be free of?
Laura Olin 2024-12-05T16:49:45+00:00
archive - patreon "about"
Mighty Vision 2024-12-08T21:37:00+00:00
archive - patreon "about"
Mighty Vision 2024-12-08T21:37:00+00:00
Advent of Code 2024
Winny's Blog 2024-12-09T06:00:00+00:00
If the PO-33 K.O. was an OP-1
Spongefile 2024-12-17T11:39:23+00:00
Gym motivator sheet
Spongefile 2024-12-19T08:47:46+00:00
Year 11 of the Cogmind
Grid Sage Games 2024-12-20T02:15:27+00:00
LET ME TELL YOU ABOUT MY SAAB IX
Infinite Gossip 2024-12-23T23:01:45+00:00
Christmas Creep
The Curiosity Cabinet 2024-12-26T11:40:40+00:00
Come and See
Steelsnowflake 2024-12-26T13:41:23+00:00
Panettone, Taste Of Italy
Pleasant Realms 2024-12-27T14:33:26+00:00
Jimmy Carter's UFO
The Curiosity Cabinet 2024-12-30T04:54:41+00:00
Ponder This Challenge - January 2025 - The irrational three-jug problem
IBM Ponder This 2025-01-01T00:00:00+00:00
> 191: Under the new weight of the sun
Laura Olin 2025-01-02T16:18:09+00:00
Lost Obelisks
Demon 2025-01-05T17:10:53+00:00
Browser Bits
Winny's Blog 2025-01-07T06:00:00+00:00
Stay Gold, America
Coding Horror 2025-01-07T07:42:04+00:00
Black Coffee And Shimza Type Of Effects
Pleasant Realms 2025-01-09T16:26:29+00:00
Building Bauble
Ian Henry 2025-01-10T00:00:00+00:00
DAVID AND HIS BROTHERS
Infinite Gossip 2025-01-10T01:04:38+00:00
from Catching the Big Fish by David Lynch
.mattfraction 2025-01-16T21:10:32+00:00
January Blues (a personal update)
Steelsnowflake 2025-01-17T12:12:55+00:00
Finding 94123 Solutions to a Math Problem
a blog by biggiemac42 2025-01-19T21:53:00+00:00
making choices on server map - part 1
Mighty Vision 2025-01-23T22:52:00+00:00
making choices on server map - part 1
Mighty Vision 2025-01-23T22:52:00+00:00
My friend Michael
ntoll.org 2025-01-25T17:45:00+00:00
Dogecoin: A Series – Part 1
The Curiosity Cabinet 2025-01-26T15:01:54+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2025-01-27T21:01:43+00:00
21 best circular saw hacks
Pleasant Realms 2025-01-28T15:45:56+00:00
When the Sackler Brothers studied LSD
Res Obscura 2025-01-29T14:46:53+00:00
State Of The (Dollar) Country 2025
Dollar Country Newsletter & Radio Show 2025-01-29T20:32:21+00:00
making choices on server map - part 2
Mighty Vision 2025-01-30T23:00:00+00:00
making choices on server map - part 2
Mighty Vision 2025-01-30T23:00:00+00:00
Dollar Country Newsletter, January 2025
Dollar Country Newsletter & Radio Show 2025-01-31T14:03:33+00:00
Welcome to Europa, 2025
Barotrauma 2025-01-31T15:22:38+00:00
CGTC 5, Day 0 Talk
Combinatorial Game Theory 2025-01-31T22:13:00+00:00
CGTC 5, Day One Talks
Combinatorial Game Theory 2025-01-31T22:55:00+00:00
Ponder This Challenge - February 2025 - Prime number magic square
IBM Ponder This 2025-02-01T00:00:00+00:00
Pocket Operator sync modes explained
Spongefile 2025-02-01T00:59:11+00:00
Dogecoin: The Founders – Part 2
The Curiosity Cabinet 2025-02-02T14:25:58+00:00
CGTC 5, Day Two Talks
Combinatorial Game Theory 2025-02-02T17:36:00+00:00
CGTC 5, Day Three Talks
Combinatorial Game Theory 2025-02-03T00:02:00+00:00
The Alien Tome
EXO 2025-02-05T01:07:42+00:00
The familiar loneliness of the Kinetoscope
Res Obscura 2025-02-05T14:08:51+00:00
> 192: I will constitute the field
Laura Olin 2025-02-06T16:27:48+00:00
Pizza Roundup 004
Maybe Pizza? 2025-02-06T20:25:40+00:00
Cyberpunk: Broken Edges
Simonschreibt. 2025-02-10T10:39:11+00:00
Dogecoin: The Tippers – Part 3
The Curiosity Cabinet 2025-02-10T13:36:10+00:00
A THIRD DISTANCE
Infinite Gossip 2025-02-12T00:29:34+00:00
Anno 1800: Shadows of Beauty
Simonschreibt. 2025-02-12T21:33:09+00:00
Happy Lupercalia
Res Obscura 2025-02-13T19:12:44+00:00
Introducing MechA, the Opus Magnum Metric of Your Nightmares
a blog by biggiemac42 2025-02-16T16:53:46+00:00
My 2024 in Review
Winny's Blog 2025-02-18T06:00:00+00:00
The Shape of a Mars Mission
Idle Words 2025-02-19T23:36:00+00:00
Apocalypse Without End: D.H. Lawrence on Revelation
Steelsnowflake 2025-02-20T12:17:13+00:00
Infinity Nikki: One-way Window
Simonschreibt. 2025-02-22T22:52:09+00:00
Dogecoin: The Pump and Dumpers – Part 4
The Curiosity Cabinet 2025-02-23T14:34:30+00:00
INFINITE GOSSIP IS NOW ON GHOST
Infinite Gossip 2025-02-26T23:39:51+00:00
Ask a Midwesterner: Is Omaha Really America's Steak Capital?
Midwesterner 2025-02-27T12:01:00+00:00
Google Summer of Code 2025
Neovim 2025-02-28T00:00:00+00:00
Ponder This Challenge - March 2025 - Electric networks in graphs
IBM Ponder This 2025-03-01T00:00:00+00:00
Dystopia of Decadence: Huxley’s Brave New World and Ours as Well
Steelsnowflake 2025-03-01T12:11:39+00:00
Release the Hounds!
Demon 2025-03-01T22:40:24+00:00
Infinity Nikki: Mysterious Shadow Drop
Simonschreibt. 2025-03-02T19:47:47+00:00
The Middle Ages
The Pudding 2025-03-03T06:00:00+00:00
Why fastDoom is fast
Fabien Sanglard 2025-03-04T00:00:00+00:00
Surrendering Optimally
a blog by biggiemac42 2025-03-04T18:25:04+00:00
AI legibility, physical archives, and the future of research
Res Obscura 2025-03-05T18:21:34+00:00
Let's Talk About The American Dream
Coding Horror 2025-03-06T01:27:31+00:00
Conquering the Final Cycle
a blog by biggiemac42 2025-03-08T05:48:26+00:00
Vienna
ntoll.org 2025-03-09T18:30:00+00:00
Onomatopoeia Odyssey
The Pudding 2025-03-10T05:00:00+00:00
Interviewing #1 backyard composter Ahram Park
The Rot 2025-03-10T17:55:52+00:00
The Last Barbecue Joint in Cairo
Midwesterner 2025-03-12T11:03:24+00:00
> 193: I know now is not the time to take up flying.
Laura Olin 2025-03-13T15:22:06+00:00
Spring update now available in the Unstable beta
Barotrauma 2025-03-14T16:54:01+00:00
The High Heel Problem
Simonschreibt. 2025-03-17T19:16:55+00:00
What we can learn from watching reality stars apologize
The Pudding 2025-03-19T05:00:00+00:00
The Road Not Taken is Guaranteed Minimum Income
Coding Horror 2025-03-20T23:33:13+00:00
Dollar Country Newsletter, March 2025
Dollar Country Newsletter & Radio Show 2025-03-21T14:01:51+00:00
Neovim 0.11
Neovim 2025-03-22T00:00:00+00:00
All notable upcoming Japanese RPGs (JRPGs) in 2025
PS5 – Destructoid 2025-03-23T15:00:25+00:00
How to use Chaser Rounds in Monster Hunter Wilds
PS5 – Destructoid 2025-03-24T12:42:22+00:00
Indiana Jones and the Great Circle PS5 release date announced
PS5 – Destructoid 2025-03-24T15:46:53+00:00
Atomfall: the difference between the Standard and Deluxe Edition
PS5 – Destructoid 2025-03-25T14:25:03+00:00
Where to find Seaside Cendrelis in Wuthering Waves 2.2
PS5 – Destructoid 2025-03-27T08:44:10+00:00
Sims 4: Mirrors
Simonschreibt. 2025-03-27T22:05:42+00:00
3/29: Ding Dong The Bug Is Dead
Demon 2025-03-30T02:06:47+00:00
Is Camellya still worth pulling in Wuthering Waves 2.2?
PS5 – Destructoid 2025-03-31T17:05:34+00:00
Best Cantarella build in Wuthering Waves – Weapons, echoes, team compositions, and sequences
PS5 – Destructoid 2025-03-31T20:14:50+00:00
You Can Compost That?!
The Rot 2025-03-31T21:52:36+00:00
Ponder This Challenge - April 2025 - Klumpengeist
IBM Ponder This 2025-04-01T00:00:00+00:00
The Generational Legacy of Samples in Music
The Pudding 2025-04-01T05:00:00+00:00
Is Cantarella worth pulling in Wuthering Waves?
PS5 – Destructoid 2025-04-01T15:15:10+00:00
When Jorge Luis Borges met one of the founders of AI
Res Obscura 2025-04-02T17:27:24+00:00
Best support builds for Iansan in Genshin Impact
PS5 – Destructoid 2025-04-03T17:52:00+00:00
Coming next week: Calm Before the Storm update
Barotrauma 2025-04-04T16:07:16+00:00
> 194: I believe my courage will expand like a sponge cowboy in water
Laura Olin 2025-04-10T12:58:46+00:00
Forever trapped inside a picture after kissing an eldritch being: all about Lyle from Look Outside
Dark RPGs 2025-04-11T08:58:44+00:00
Emacs: Edit as root using sudo-edit
Winny's Blog 2025-04-11T16:22:17+00:00
Unlock all new weapons and characters in SaGa-themed Vampire Survivors update “Emerald Diorama”
PS5 – Destructoid 2025-04-11T20:11:24+00:00
How to quickly break the bounds of space and find love in Vampire Survivors
PS5 – Destructoid 2025-04-12T18:20:23+00:00
FREE COMPOST, now showing
The Rot 2025-04-12T20:39:25+00:00
Sprouts 2025 Morning Talks
Combinatorial Game Theory 2025-04-13T02:44:00+00:00
Sprouts 2025 Keynote and Afternoon Session Summaries
Combinatorial Game Theory 2025-04-13T14:11:00+00:00
Sprouts2025 Wrap-up
Combinatorial Game Theory 2025-04-13T14:59:00+00:00
The Pour-igin of Species
The Pudding 2025-04-16T05:00:00+00:00
Onfim's world
Res Obscura 2025-04-16T13:10:51+00:00
Trunk Update and 0.33 Tournament Announcement
Dungeon Crawl Stone Soup 2025-04-18T21:32:48+00:00
Breaking the Roman Republic: The Tragedy of Tiberius Gracchus
Steelsnowflake 2025-04-19T12:10:53+00:00
10PRINT inspired "Snowcrash" in Emacs
Winny's Blog 2025-04-19T17:40:25+00:00
K.O. II EP-133 Champions update cheat sheet
Spongefile 2025-04-19T20:58:51+00:00
Oblivion Remastered Fin Gleam Helm location
PS5 – Destructoid 2025-04-23T21:47:23+00:00
Building a tower to reach heaven in a world that could be from a cult lost PS2 RPG: Interview with Ghrian Studio, the developer of BURGGEIST
Dark RPGs 2025-04-24T09:26:10+00:00
How to get more Magicka in Oblivion Remastered
PS5 – Destructoid 2025-04-25T14:55:54+00:00
Update on my Racket exit
Winny's Blog 2025-04-25T19:47:23+00:00
On the aura of Ruth Stout & not sifting my compost
The Rot 2025-04-25T22:13:13+00:00
Augustus Didn't Kill the Roman Republic (It Was Already Dead)
Steelsnowflake 2025-04-28T17:50:09+00:00
0.33 Tournament Page
Dungeon Crawl Stone Soup 2025-04-30T01:45:08+00:00
Machine With Wishbone
Pleasant Realms 2025-04-30T18:00:47+00:00
Thoughts on time
Spongefile 2025-04-30T21:37:00+00:00
Ponder This Challenge - May 2025 - The prime arithmetic quiz
IBM Ponder This 2025-05-01T00:00:00+00:00
Do rabbits eat carrots because of Clark Gable?
Snack Stack 2025-05-01T11:53:00+00:00
Darkest Light
Steelsnowflake 2025-05-01T12:55:36+00:00
620 Club – St. Paul, MN
You Care What We Think 2025-05-01T16:10:00+00:00
0.33 “Reforge Yourself”
Dungeon Crawl Stone Soup 2025-05-02T17:07:35+00:00
Here We Are In Eden…
Demon 2025-05-05T03:42:40+00:00
Jellybean and Julia’s BBQ – Coon Rapids, MN
You Care What We Think 2025-05-05T16:30:00+00:00
Where does Grand Theft Auto 6 take place?
PS5 – Destructoid 2025-05-06T15:01:56+00:00
COMING UP WITH A COMPLETE LIST OF WAYS TO FEEL GOOD
Infinite Gossip 2025-05-07T03:17:13+00:00
deeelite1988
Pleasant Realms 2025-05-07T16:38:17+00:00
AI makes the humanities more important, but also a lot weirder
Res Obscura 2025-05-07T19:24:36+00:00
Are you more likely to die on your birthday?
The Pudding 2025-05-08T05:00:00+00:00
> 195: If I stand very still, I do no further harm.
Laura Olin 2025-05-08T12:19:20+00:00
Lulu’s Thai Noodle Shop – Kansas City, MO
You Care What We Think 2025-05-11T04:20:00+00:00
K.O. II EP-133 timing map
Spongefile 2025-05-12T10:30:40+00:00
Slap’s BBQ – Kansas City, MO
You Care What We Think 2025-05-14T16:30:00+00:00
Integers 2025 CGT Talks
Combinatorial Game Theory 2025-05-15T02:11:00+00:00
Guy Whipping A Massive Chain
Pleasant Realms 2025-05-15T15:19:56+00:00
An update on soil testing and LA after the fires.
The Rot 2025-05-16T17:03:22+00:00
Building my childhood dream PC
Fabien Sanglard 2025-05-18T00:00:00+00:00
Café Corazon – Kansas City, MO
You Care What We Think 2025-05-18T16:30:00+00:00
Building Number Factories in Beltmatic
a blog by biggiemac42 2025-05-19T05:31:24+00:00
Why were Belle Époque cities beautiful?
Res Obscura 2025-05-21T19:37:37+00:00
St Louis Skills on Wheels 2025
Pleasant Realms 2025-05-23T15:28:29+00:00
0.33 Tournament Results
Dungeon Crawl Stone Soup 2025-05-25T02:40:47+00:00
THE DEAL I STRUCK WITH BURGER KING
Infinite Gossip 2025-05-26T02:50:35+00:00
Access Control Syntax
journal.stuffwithstuff.com 2025-05-26T07:00:00+00:00
Hot Hands Pie & Biscuit – St. Paul, MN
You Care What We Think 2025-05-27T21:07:00+00:00
Asian Misrepresentation
The Pudding 2025-05-28T05:00:00+00:00
TP-7 guide: going deeper
Spongefile 2025-05-29T16:04:05+00:00
868-BACK trailer: UNIFIED
Mighty Vision 2025-05-29T22:10:00+00:00
868-BACK trailer: UNIFIED
Mighty Vision 2025-05-29T22:10:00+00:00
Consider Knitting
journal.stuffwithstuff.com 2025-05-30T07:00:00+00:00
🕸️ 28 Years of Web Development
Nathan Youngman 2025-05-31T00:00:00+00:00
An Aesthetic Approach
ntoll.org 2025-05-31T10:30:00+00:00
The symbolism behind a grieving family: Analysis of the Axons of Clair Obscur Expedition 33
Dark RPGs 2025-05-31T14:33:18+00:00
Ponder This Challenge - June 2025 - Jumping frog game
IBM Ponder This 2025-06-01T00:00:00+00:00
Published for the first time: the Princeton INTERCAL Compiler's source code
esoteric.codes 2025-06-01T11:27:00+00:00
LLMs are cheap
Juho Snellman's Weblog 2025-06-02T22:00:00+00:00
Apps in the late stage gold rush
Spongefile 2025-06-03T10:45:21+00:00
The contested cracker from Southeast Asia
Snack Stack 2025-06-03T11:47:00+00:00
Sticker print run signup
Spongefile 2025-06-04T10:03:16+00:00
Every announcement from PlayStation State of Play – June 2025
PS5 – Destructoid 2025-06-04T22:02:20+00:00
The Loneliness Epidemic, in Data
The Pudding 2025-06-05T05:00:00+00:00
0.33.1 Bugfix Release
Dungeon Crawl Stone Soup 2025-06-06T23:59:01+00:00
On how to compost grass, a quickie edition
The Rot 2025-06-09T18:40:26+00:00
Stop Uploading Your Data to Google
ignorethecode.net 2025-06-11T20:00:07+00:00
30 Minutes with a Stranger
The Pudding 2025-06-12T05:00:00+00:00
> 196: Remember this
Laura Olin 2025-06-12T16:58:10+00:00
💭 Career Break: What Will I Do
Nathan Youngman 2025-06-13T00:00:00+00:00
💸 Career Break: How I Got Here
Nathan Youngman 2025-06-13T00:00:00+00:00
56: XTDB: A Bitemporal database in Clojure
The REPL 2025-06-13T08:23:56+00:00
PS5’s Wolverine game finally just made an appearance after several years of silence, and it’s not dead after all
PS5 – Destructoid 2025-06-13T16:32:47+00:00
Sony says PS5 is now more profitable than any PlayStation console before it, and PS6 is on the way
PS5 – Destructoid 2025-06-13T16:53:35+00:00
Wonderful Toolchain project update - June 2025
Posts on asie's blog 2025-06-15T00:00:00+00:00
DESERT MOVING, INC.
Infinite Gossip 2025-06-18T06:26:11+00:00
Our waste infrastructure lags behind the products we manufacture
The Rot 2025-06-26T17:52:11+00:00
Video Game Thoughts Bonus Bag #6
The Bottom Feeder 2025-06-26T18:01:07+00:00
Enter the Meadow
Wild Information 2025-06-29T14:01:27+00:00
Ponder This Challenge - July 2025 - Swallows on a Wire
IBM Ponder This 2025-07-01T00:00:00+00:00
Childhood California Fans
Pleasant Realms 2025-07-02T13:04:28+00:00
Is Helldivers 2 on Xbox?
PS5 – Destructoid 2025-07-06T18:45:00+00:00
Bears Will Be Boys
The Pudding 2025-07-07T05:00:00+00:00
Uncontrolled Remains
BLDGBLOG 2025-07-07T20:41:30+00:00
Architectural Dressage
BLDGBLOG 2025-07-08T16:03:43+00:00
TRICK SHOT
Infinite Gossip 2025-07-10T07:33:39+00:00
Dollar Country Episode 252: Beer Money
Dollar Country Newsletter & Radio Show 2025-07-10T14:45:19+00:00
Setting Up an SDL3 Mac App in XCode 16
journal.stuffwithstuff.com 2025-07-13T07:00:00+00:00
Dollar Country Episode 253: Lonesome Crazy
Dollar Country Newsletter & Radio Show 2025-07-13T12:02:48+00:00
🤖 Sudo Make Me A Triangle
Nathan Youngman 2025-07-15T00:00:00+00:00
Dollar Country Episode 254: Music From The Great Plains
Dollar Country Newsletter & Radio Show 2025-07-17T14:03:07+00:00
> 197: I knew so much and sang anyway
Laura Olin 2025-07-17T14:09:08+00:00
geo/acc
BLDGBLOG 2025-07-19T17:33:47+00:00
A New Economy
Demon 2025-07-20T16:54:25+00:00
Seer
BLDGBLOG 2025-07-20T21:20:52+00:00
Wallace Stevens and the Poetry We No Longer Write
Steelsnowflake 2025-07-22T18:51:55+00:00
NYC's Urban Textscape
The Pudding 2025-07-24T05:00:00+00:00
Dollar Country Episode 255: Makin' Steel
Dollar Country Newsletter & Radio Show 2025-07-24T14:03:00+00:00
Those Secret Fonts from the ISA-16 PS/2 Models (Again)
int10h.org - VileR's blog 2025-07-26T15:56:43+00:00
When Cats and a Cat God help you escape from a SCP-like facility in the dark JRPG Break Wolf [Mechanic]
Dark RPGs 2025-07-28T09:48:03+00:00
Coming Soon ... Avernum 4: Greed and Glory!
The Bottom Feeder 2025-07-30T19:00:26+00:00
OpenAI's "Study Mode" and the risks of flattery
Res Obscura 2025-07-31T13:32:16+00:00
Ponder This Challenge - August 2025 - A grid-cutting game
IBM Ponder This 2025-08-01T00:00:00+00:00
Mr. Mustacheo – West St. Paul, MN
You Care What We Think 2025-08-02T03:36:00+00:00
Eating the Engram
Wild Information 2025-08-03T15:01:44+00:00
This one-and-done PSP masterpiece healed my aversion to tactical RPGs
PS5 – Destructoid 2025-08-04T17:14:20+00:00
Mineral Hurricane
BLDGBLOG 2025-08-04T21:07:15+00:00
Quadratic Number Fields
nklein software 2025-08-05T02:46:13+00:00
Bag of words, have mercy on us
Experimental History 2025-08-05T20:06:09+00:00
My new book: YOU MUST UNDERSTAND THIS IF YOU WANT TO LIVE
Infinite Gossip 2025-08-06T00:50:58+00:00
22 Northmen Brewing Company – Alexandria, MN
You Care What We Think 2025-08-06T16:30:00+00:00
I Have No Mouth, and I Must Scream: The 30 Year Late Review
The Bottom Feeder 2025-08-06T18:48:54+00:00
Designing for Mastery in Roguelikes (w/Roguelike Radio)
Grid Sage Games 2025-08-07T00:47:51+00:00
Italy's undercover pizza detectives (AVPN Is a Scam)
Maybe Pizza? 2025-08-07T20:30:01+00:00
Still A Pressing Issue
Discworld MUD Dev Blog 2025-08-11T10:29:39+00:00
Dicing an Onion, the Mathematically Optimal Way
The Pudding 2025-08-12T05:00:00+00:00
Making Everything Groovy
Discworld MUD Dev Blog 2025-08-12T11:25:51+00:00
All Souls exam questions and the limits of machine reasoning
Res Obscura 2025-08-13T20:33:27+00:00
How To Get Internet Feedback Without Going Insane
The Bottom Feeder 2025-08-14T18:04:47+00:00
Youthful Indiscretion
Discworld MUD Dev Blog 2025-08-19T08:56:59+00:00
PS5 joins price-hike party in US due to ‘challenging economic environment’
PS5 – Destructoid 2025-08-20T15:40:50+00:00
2025 Minnesota State Fair - Falcon Heights, MN
You Care What We Think 2025-08-21T18:28:00+00:00
LET ME TELL YOU ABOUT MY SAAB X
Infinite Gossip 2025-08-26T10:39:09+00:00
How to Make A Mushroom Soda (That Tastes Like Peach)
Midwesterner 2025-08-27T12:03:18+00:00
> 198: The world is a laden thing
Laura Olin 2025-08-28T14:51:26+00:00
😎 Summer Break
Nathan Youngman 2025-08-29T00:00:00+00:00
🌱 My Vegan Journey
Nathan Youngman 2025-08-31T00:00:00+00:00
Ponder This Challenge - September 2025 - Cake flip-cutting
IBM Ponder This 2025-09-01T00:00:00+00:00
Use this magic bullet to shoot yourself in the foot
Experimental History 2025-09-02T16:48:01+00:00
Stickers printed
Spongefile 2025-09-03T20:36:38+00:00
I Review the New Dungeons & Dragons Art, Unwisely
The Bottom Feeder 2025-09-04T20:19:40+00:00
untitled
Jon Rafman 2025-09-04T21:18:43+00:00
Wonderful Toolchain project update - September 2025
Posts on asie's blog 2025-09-05T00:00:00+00:00
untitled
Jon Rafman 2025-09-06T14:38:10+00:00
untitled
Jon Rafman 2025-09-07T01:20:28+00:00
untitled
Jon Rafman 2025-09-07T15:00:35+00:00
The curious history of Chicken in a Biskit
Snack Stack 2025-09-07T19:10:39+00:00
untitled
Jon Rafman 2025-09-08T15:00:26+00:00
untitled
Jon Rafman 2025-09-09T05:08:34+00:00
untitled
Jon Rafman 2025-09-09T15:05:13+00:00
untitled
Jon Rafman 2025-09-09T19:04:30+00:00
untitled
Jon Rafman 2025-09-09T19:24:50+00:00
untitled
Jon Rafman 2025-09-09T21:50:49+00:00
untitled
Jon Rafman 2025-09-09T23:52:37+00:00
Found The Thread
Discworld MUD Dev Blog 2025-09-10T17:47:56+00:00
untitled
Jon Rafman 2025-09-10T21:16:07+00:00
THE MACHINE HAS TO BE WRONG
Infinite Gossip 2025-09-11T06:52:13+00:00
untitled
Jon Rafman 2025-09-14T14:53:12+00:00
Pizza Roundup 005
Maybe Pizza? 2025-09-14T22:27:17+00:00
untitled
STML 2025-09-15T09:05:41+00:00
The mousy snack for a Dutch baby
Snack Stack 2025-09-15T19:53:15+00:00
untitled
Jon Rafman 2025-09-16T03:28:42+00:00
untitled
STML 2025-09-16T06:04:25+00:00
Return of CTRI Innovations
Fujichia 2025-09-16T11:44:31+00:00
Blog Extravaganza 2025: the winners
Experimental History 2025-09-16T14:14:58+00:00
7 things we want to see from the next PlayStation State of Play
PS5 – Destructoid 2025-09-16T17:53:45+00:00
untitled
Jon Rafman 2025-09-17T01:33:09+00:00
The Avernum 4 Story, Part 2: What Went Wrong and Why
The Bottom Feeder 2025-09-17T18:18:22+00:00
untitled
Jon Rafman 2025-09-18T01:08:43+00:00
The moments
Escaping Flatland 2025-09-18T08:51:32+00:00
A Blanket Solution
Discworld MUD Dev Blog 2025-09-18T15:01:34+00:00
Star Forts, Mines, and Other Maastricht Subterranea
BLDGBLOG 2025-09-18T17:53:53+00:00
Celestial Detector
BLDGBLOG 2025-09-19T05:42:09+00:00
untitled
Jon Rafman 2025-09-23T07:45:26+00:00
untitled
Jon Rafman 2025-09-23T08:01:15+00:00
A Couple Of QoL Tweaks To Coatings
Discworld MUD Dev Blog 2025-09-23T13:39:42+00:00
Touch me touch me
Muppe 2025-09-23T19:57:28+00:00
I’m crotchwalking at you in the deadmans night, are you coming with me or are you coming with me, or…
Muppe 2025-09-23T20:01:28+00:00
buuble butt
Muppe 2025-09-23T20:01:47+00:00
REAR-ending
Muppe 2025-09-23T20:02:23+00:00
the feigned sound of a whistle in the town during noon
Muppe 2025-09-23T20:03:23+00:00
Sometimes you’re brave
Muppe 2025-09-23T20:04:30+00:00
A nerving..
Muppe 2025-09-23T20:05:20+00:00
Howl like the Wet-Nap washed you
Muppe 2025-09-23T20:06:16+00:00
nothing big is coming
Muppe 2025-09-23T20:10:02+00:00
What do you think it all means, Chris?
Muppe 2025-09-23T20:24:24+00:00
Follow like piper
Muppe 2025-09-23T20:26:16+00:00
Byung-Chul Han, Anomalisa, and the Myth of Sameness
Steelsnowflake 2025-09-23T22:26:07+00:00
can I be your puckish pet again?
Muppe 2025-09-23T23:26:39+00:00
SNIKT! Marvel’s Wolverine finally makes triumphant return at new PS5 State of Play showcase
PS5 – Destructoid 2025-09-24T21:49:11+00:00
> 199: I am building what I cannot break
Laura Olin 2025-09-25T13:58:42+00:00
Forty-Four Esolangs: an artist's monograph of programming languages
esoteric.codes 2025-09-26T13:37:00+00:00
Bruce Loose RIP
Fujichia 2025-09-26T21:54:46+00:00
I made a newspaper
The Rot 2025-09-26T23:45:45+00:00
untitled
Jon Rafman 2025-09-27T02:09:34+00:00
Maisey Goes To Therapy
Discworld MUD Dev Blog 2025-09-29T07:44:34+00:00
Bodhisattva
Discworld MUD Dev Blog 2025-09-30T11:17:07+00:00
Thank you for being annoying
Experimental History 2025-09-30T14:35:46+00:00
Glenn Ligon, “Condition Report” (2000)
STML 2025-09-30T18:18:27+00:00
Ponder This Challenge - October 2025 - Counting Mazes
IBM Ponder This 2025-09-30T23:00:00+00:00
How I read
Escaping Flatland 2025-10-01T12:30:14+00:00
Obscure Emacs Package: ssh-config-mode
Winny's Blog 2025-10-01T20:24:47+00:00
Video Game Thoughts Bonus Bag #7
The Bottom Feeder 2025-10-01T21:05:56+00:00
The Age of Books and the Age of Brainrot
Res Obscura 2025-10-02T13:45:15+00:00
How to fix Black Ops 7 ‘files cannot be managed in game by users on this platform’ error
PS5 – Destructoid 2025-10-02T20:18:28+00:00
Which horse should you choose in Ghost of Yotei? Old Trails quest decision
PS5 – Destructoid 2025-10-02T20:51:53+00:00
When is Ghost of Yotei coming to PC?
PS5 – Destructoid 2025-10-02T21:25:30+00:00
Napoleon's Fiasco: The Last Days of the Haitian Revolution
Steelsnowflake 2025-10-03T00:10:50+00:00
Book Review: Ulysses Audiobook
Fujichia 2025-10-03T11:31:35+00:00
untitled
Jon Rafman 2025-10-03T18:14:13+00:00
Unity Security Vulnerability Update
Demon 2025-10-03T23:29:22+00:00
Birth of Prettier
Vjeux 2025-10-04T20:33:06+00:00
Sick Reverie
Steelsnowflake 2025-10-06T05:00:00+00:00
The smell of earth
The Rot 2025-10-06T18:05:16+00:00
Debian Package Stats using Sqlite
Winny's Blog 2025-10-06T23:38:17+00:00
Python 3.14 - Changes to look for
Winny's Blog 2025-10-07T20:55:58+00:00
OUT ON THE DUNGEON FLOOR
Infinite Gossip 2025-10-08T06:02:40+00:00
Agentic fragments
Escaping Flatland 2025-10-08T13:25:01+00:00
The Rats in Look Outside: A spreading disease of fur and teeth
Dark RPGs 2025-10-08T14:26:42+00:00
The Chicago three-dick salute
Food is Stupid 2025-10-10T13:03:14+00:00
G-G on Facebook - G-G on Twitter
garfield minus garfield 2025-10-12T19:15:11+00:00
Some usecases for GNU Units
Winny's Blog 2025-10-12T19:33:05+00:00
Bamboozle me, daddy
Experimental History 2025-10-14T15:48:04+00:00
Avernum 4: Greed and Glory Demo Out, Plus An Interview
The Bottom Feeder 2025-10-14T18:50:35+00:00
List Of Topics Discussed
Fujichia 2025-10-15T20:27:24+00:00
Meet Danny and his compost app Peels
The Rot 2025-10-17T15:51:18+00:00
Another Coat Of Pain
Discworld MUD Dev Blog 2025-10-18T15:15:18+00:00
What it’s like to walk across Massachusetts
The Pudding 2025-10-20T05:00:00+00:00
One of my favourite paintings:
STML 2025-10-21T06:21:56+00:00
“Terracotta anatomical votive; human eye.” 3rdC BC-1stC BC, Italy.
STML 2025-10-21T06:38:40+00:00
Eye idol ca. 3700–3500 BCE On view at The Met Fifth Avenue in Gallery 202 This type of figurine…
STML 2025-10-21T06:40:36+00:00
untitled
STML 2025-10-21T06:45:40+00:00
Lover’s Eye, returned to Lender.
STML 2025-10-21T06:46:12+00:00
The curious, contentious history of pumpkin spice lattes
Snack Stack 2025-10-21T19:55:31+00:00
Sheathing And A Peek Ahead
Discworld MUD Dev Blog 2025-10-22T05:15:07+00:00
A Case Of Scope Creep
Discworld MUD Dev Blog 2025-10-22T07:40:34+00:00
Avernum 4: Greed And Glory Is Out!
The Bottom Feeder 2025-10-22T13:25:55+00:00
Recent Spooky Movies
Fujichia 2025-10-23T11:26:15+00:00
When is it better to think without words?
Escaping Flatland 2025-10-23T12:10:48+00:00
This stunningly gorgeous open-world RPG from Korean devs looks too good to be true, and I’m hoping they prove me wrong
PS5 – Destructoid 2025-10-23T21:25:15+00:00
I Can't Believe It's Not Butter! Chicken
Food is Stupid 2025-10-24T13:02:49+00:00
text-mode: [水墨]奔马图 by Gatchaman (2011).
text-mode 2025-10-24T18:31:02+00:00
actegratuit: Mantras and Meditations by Meg Hitchcock
text-mode 2025-10-25T18:31:07+00:00
My Door Is Aways Unlocked
Discworld MUD Dev Blog 2025-10-26T16:10:13+00:00
text-mode: Zdeněk Sýkora’s ventilation tower and paintings....
text-mode 2025-10-26T18:31:08+00:00
text-mode: The myth about Bird B (K. Holten & E. Mourier,...
text-mode 2025-10-27T19:30:39+00:00
The Decline of Deviance
Experimental History 2025-10-28T15:21:32+00:00
text-mode: ‘Grace Triptych’ by Keira Rathbone, 2011.
text-mode 2025-10-28T19:30:33+00:00
There's Always A Catch.
Discworld MUD Dev Blog 2025-10-29T15:10:59+00:00
text-mode: Jiří Valoch “Homage o Ladislav Novák”
text-mode 2025-10-29T19:30:32+00:00
text-mode: Three mainstream computer heroes, rendered in their...
text-mode 2025-10-30T19:31:02+00:00
I Can't Believe It's Not Butter! Or Chicken!
Food is Stupid 2025-10-31T13:02:50+00:00
text-mode: Klaus Basset, Kubus, 1974. Typewriter graphics only...
text-mode 2025-10-31T19:30:39+00:00
Ponder This Challenge - November 2025 - The CAT sequence
IBM Ponder This 2025-11-01T00:00:00+00:00
The Epic Last Stand of Louis Delgrès: “Live Free or Die”
Steelsnowflake 2025-11-01T12:18:43+00:00
text-mode: Space Harrier, text mode version. For the Japanese...
text-mode 2025-11-01T19:30:36+00:00
text-mode: Carl Fernbach-Flarsheim - Boolean Image/Conceptual...
text-mode 2025-11-02T19:30:53+00:00
In pursuit of democracy
The Pudding 2025-11-03T06:00:00+00:00
We Need to Talk About Black Walnuts (Again)
Midwesterner 2025-11-03T12:03:01+00:00
text-mode: Commodore 64 PETSCII acid wolf fax graphics by...
text-mode 2025-11-03T19:30:56+00:00
A list of books and essays that I love
Escaping Flatland 2025-11-04T11:27:24+00:00
New features of the EP-40 Riddim
Spongefile 2025-11-04T14:23:33+00:00
text-mode: Kindergarten Paper Weavings from circa 1900...
text-mode 2025-11-04T19:30:57+00:00
"get 1000 Mountains Of Ash"
Discworld MUD Dev Blog 2025-11-05T05:10:38+00:00
A Lost IBM PC/AT Model? Analyzing a Newfound Old BIOS
int10h.org - VileR's blog 2025-11-05T07:07:01+00:00
Introducing: Springfield-Style Black Walnut Chicken
Midwesterner 2025-11-05T12:02:52+00:00
The PS5 handheld’s latest update may have just made it a must-buy for the holiday season
PS5 – Destructoid 2025-11-05T15:36:53+00:00
Lioconcha hieroglyphica is a saltwater clam that makes cellular automata-style patterns in some kind…
text-mode 2025-11-05T19:30:36+00:00
Four Ways To Make Your Turn-Based Game More Interesting (Or Ruin It)
The Bottom Feeder 2025-11-05T20:50:09+00:00
> 200: We were trying to live a personal life
Laura Olin 2025-11-06T14:32:22+00:00
Can automation help make the humanities more human?
Res Obscura 2025-11-06T23:08:09+00:00
Lighting The Way
Discworld MUD Dev Blog 2025-11-07T05:42:22+00:00
The Black Walnut Snack Pack
Midwesterner 2025-11-07T12:03:36+00:00
Shrimp cocktail
Food is Stupid 2025-11-07T14:03:17+00:00
A Heavier Coat Of Pain
Discworld MUD Dev Blog 2025-11-08T13:43:40+00:00
It's not always easy to leave your leaves
The Rot 2025-11-10T00:27:36+00:00
THE HANDSOME BROTHERS
Infinite Gossip 2025-11-10T05:57:26+00:00
this guy sucks at throwing
Terrible Banana 2025-11-11T18:38:24+00:00
You say potato, I say leprosy
Experimental History 2025-11-12T00:45:53+00:00
The McSlug
Food is Stupid 2025-11-14T14:03:38+00:00
How quake.exe got its TCP/IP stack
Fabien Sanglard 2025-11-17T00:00:00+00:00
The Time I Annoyed Lord British and He Gave Me His Debris
The Bottom Feeder 2025-11-17T22:20:29+00:00
How well can Gemini 3 make a Henry James simulator?
Res Obscura 2025-11-19T00:27:38+00:00
“We strike a balance of what we call a “grounded openness” that avoids the traps of provincialism…
STML 2025-11-19T10:05:34+00:00
so a very long time ago, my dad worked with an arson investigator
STML 2025-11-19T10:06:55+00:00
When I accept myself just as I am, I change
Escaping Flatland 2025-11-19T10:46:28+00:00
Talk To Me
Fujichia 2025-11-19T18:05:14+00:00
Evil pizza
Food is Stupid 2025-11-21T14:03:38+00:00
Please, Support Books
ignorethecode.net 2025-11-22T14:21:51+00:00
Grimdark JRPGs for fans of Fear & Hunger
Dark RPGs 2025-11-23T14:07:13+00:00
Avatar: The Last Airbender Draft, Round 2 (WUBRG Drafting)
Mediocre Magic 2025-11-23T15:35:00+00:00
Quake Engine Indicators
Fabien Sanglard 2025-11-24T00:00:00+00:00
Frepack Draft: Clue and Explorers of Ixalan
Mediocre Magic 2025-11-24T01:42:00+00:00
Composting textiles with Everybody.World
The Rot 2025-11-24T19:33:37+00:00
Understanding The Player Brain, Pt. 1: Loss Avoidance
The Bottom Feeder 2025-11-24T21:42:31+00:00
Zombie cakes, the dead dessert of the 1950s
Snack Stack 2025-11-24T23:06:45+00:00
Secrets of the ancient memelords
Experimental History 2025-11-25T19:17:23+00:00
Year Of Me
Mighty Vision 2025-11-25T23:07:00+00:00
Year Of Me
Mighty Vision 2025-11-25T23:07:00+00:00
Generalized Worley Noise
Ian Henry 2025-11-26T00:00:00+00:00
How the EP series fader works
Spongefile 2025-11-26T10:23:43+00:00
Hogswatch Timing
Discworld MUD Dev Blog 2025-11-27T20:48:43+00:00
Wonderful Toolchain project update - November 2025
Posts on asie's blog 2025-11-30T00:00:00+00:00
Periodic Spaces
Ian Henry 2025-11-30T00:00:00+00:00
Ponder This Challenge - December 2025 - Sums of a prime and an even number
IBM Ponder This 2025-11-30T22:15:00+00:00
Electricity for fun (and mechatronics teachers)
Spongefile 2025-12-02T00:33:00+00:00
Just and loving seeing
Escaping Flatland 2025-12-02T12:55:45+00:00
Why WinQuake exists and how it works
Fabien Sanglard 2025-12-03T00:00:00+00:00
Ravnica Clue + Avatar: TLA Beginner's Box
Mediocre Magic 2025-12-03T23:55:00+00:00
Why I have been writing a niche history blog for 15 years
Res Obscura 2025-12-04T18:43:04+00:00
> 201: A taxi cab floating across three lanes with its lamp lit
Laura Olin 2025-12-04T19:43:07+00:00
Elattes
Food is Stupid 2025-12-05T14:02:44+00:00
Teenage Engineering connection cheat sheets
Spongefile 2025-12-05T14:38:47+00:00
The sunny snack from Azerbaijan
Snack Stack 2025-12-05T20:00:24+00:00
A Full Pod of Chaos (WUBRG Drafting)
Mediocre Magic 2025-12-07T01:02:00+00:00
Feed the Soil (and the rest will follow)
Wild Information 2025-12-07T22:35:34+00:00
The hidden Superbosses of Look Outside (till v2.1)
Dark RPGs 2025-12-08T20:19:51+00:00
Common Threads
The Pudding 2025-12-09T06:00:00+00:00
The drug that taught me how much I should suffer
Experimental History 2025-12-09T17:46:17+00:00
Sean Sherman's Pápa Waháŋpi
Midwesterner 2025-12-09T22:37:36+00:00
[Outliers] Bernie Marcus: The Home Depot Story
Farnam Street 2025-12-11T10:30:00+00:00
Ask a Midwesterner: Why Can't I Find a Bowl of Pápa Waháŋpi?
Midwesterner 2025-12-11T12:03:07+00:00
Reflections on my first year writing full time
Escaping Flatland 2025-12-11T17:12:02+00:00
Video Game Thoughts Bonus Bag #8
The Bottom Feeder 2025-12-11T22:18:47+00:00
Haiku Activity & Contract Report, November 2025 (ft. Go)
Haiku Project 2025-12-12T21:20:00+00:00
Microbes at work
The Rot 2025-12-16T17:43:34+00:00
FIRST OF JUNE
Infinite Gossip 2025-12-17T04:34:44+00:00
Game console prices reached unfortunate highs in 2025, and November sales hit rockbottom because of it
PS5 – Destructoid 2025-12-17T17:17:13+00:00
events this week!!
Fujichia 2025-12-17T23:00:08+00:00
Year 12 of the Cogmind
Grid Sage Games 2025-12-18T02:54:36+00:00
Be Your Best in 2026: The Most Important Lessons from The Knowledge Project (2025)
Farnam Street 2025-12-18T10:30:00+00:00
The closest thing we might ever get to a new Dino Crisis game is coming in a few weeks, but I’m extremely skeptical
PS5 – Destructoid 2025-12-18T16:03:52+00:00
WHAT I LEARNED ABOUT PUBLISHING SHORT STORIES ONLINE IN 2025
Infinite Gossip 2025-12-19T03:34:53+00:00
Nog Eggs
Food is Stupid 2025-12-19T14:03:10+00:00
The Gerrit code review iceberg, episode 3
Haiku Project 2025-12-19T20:30:00+00:00
Two Multiplayer Mostly-Avatar Drafts (WUBRG Drafting)
Mediocre Magic 2025-12-21T04:53:00+00:00
Pfeffernög
Midwesterner 2025-12-22T12:00:29+00:00
"AI" is bad UX
Apperceptive by Sam 2025-12-22T16:34:01+00:00
Chaos Collects Clues (WUBRG Drafting)
Mediocre Magic 2025-12-24T00:01:00+00:00
The Outlier Playbook: The Patterns Behind Enduring Success
Farnam Street 2025-12-25T10:30:00+00:00
Top ten composts this year
The Rot 2025-12-26T21:51:20+00:00
Pierre Poilievre on the Role of Government, Freedom, and Affordability
Farnam Street 2025-12-27T12:13:05+00:00
Those Curious Naturalists
Wild Information 2025-12-28T16:39:01+00:00
worldsofzzt: Source “Toypole” by Agent Orange (2022) Published...
text-mode 2025-12-29T11:22:48+00:00
untitled
text-mode.org 2025-12-29T14:37:26+00:00
untitled
text-mode.org 2025-12-29T14:45:46+00:00
untitled
text-mode.org 2025-12-29T15:10:13+00:00
untitled
text-mode.org 2025-12-29T15:24:12+00:00
untitled
text-mode.org 2025-12-29T15:35:29+00:00
Typewriter works by Montserrat Alberich Escardívol (1912-1973). Her first exhibition was in 1929…
text-mode 2025-12-29T19:30:37+00:00
The Top Ways Video Games Affect Your Brain. Number Five May Disturb You!
The Bottom Feeder 2025-12-29T21:43:22+00:00
Ponder This Challenge - January 2026 - Number splitting
IBM Ponder This 2025-12-29T22:15:00+00:00
Ravnica Clue EDH over SpellTable
Mediocre Magic 2025-12-30T03:22:00+00:00
Various works by Haji, 1998-2001. via 16colors
text-mode 2025-12-30T19:30:29+00:00
Between Ruin & Repair
City of Yes 2025-12-31T15:30:39+00:00
Anamie by Hack n’ Trade and Razor 1991. A PC-demo based on Amiga ASCII and some custom characters.
text-mode 2025-12-31T19:30:31+00:00
The Gerrit code review iceberg, episode 4
Haiku Project 2025-12-31T22:30:00+00:00
James Clear: How to Build Good Habits & Break Bad Ones
Farnam Street 2026-01-01T10:30:00+00:00
> 202: What resonated, 2025
Laura Olin 2026-01-01T15:36:44+00:00
Näyttää Betonilta by Duce, 2025. C64 PETSCII, inspired by Odeith.
text-mode 2026-01-01T19:30:31+00:00
The Power of the Image. Artists vs fascists
We Make Money Not Art 2026-01-02T14:48:19+00:00
fungi.neocities.org is a site by Polyducks, who’s been featured here many times before. It features…
text-mode 2026-01-02T19:30:34+00:00
Textual Paint is a textmode version of MS Paint that runs in the terminal. Made by Isaiah Odhner.
text-mode 2026-01-03T19:30:29+00:00
Interview with yayimhere
esoteric.codes 2026-01-05T04:06:00+00:00
The dating industry is weird.
The Curiosity Cabinet 2026-01-06T03:26:05+00:00
The secrets of human-animal hybrids escaping from a SCP-like facility: Interview with RE Atelier, the team behind the great JRPG Break Wolf
Dark RPGs 2026-01-06T11:20:33+00:00
How to be less awkward
Experimental History 2026-01-06T16:45:44+00:00
Being creative requires taking risks
Escaping Flatland 2026-01-07T11:59:34+00:00
Chaos is Clued In (WUBRG Drafting)
Mediocre Magic 2026-01-07T21:46:00+00:00
Building a 1997 Quake PC!
Fabien Sanglard 2026-01-08T00:00:00+00:00
[Outliers] The Multidisciplinary Approach to Thinking | Peter D. Kaufman
Farnam Street 2026-01-08T10:18:00+00:00
Chaos Jumpstart Clue
Mediocre Magic 2026-01-08T15:37:00+00:00
Algorithmic hover states with contrast-color()
daverupert.com 2026-01-08T16:46:00+00:00
Using your design system colors with contrast-color()
daverupert.com 2026-01-09T03:21:00+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-01-09T13:18:28+00:00
untitled
text-mode.org 2026-01-09T14:00:19+00:00
Interpolate contrast-color() to manipulate lightness
daverupert.com 2026-01-09T15:35:00+00:00
Only Connect
City of Yes 2026-01-09T16:16:21+00:00
The computer is the key to love.
The Curiosity Cabinet 2026-01-09T20:22:58+00:00
Murder
Mark Bernstein 2026-01-10T15:25:05+00:00
Focus rings with nested contrast-color()?
daverupert.com 2026-01-11T18:30:00+00:00
0.34 Trunk Update and Tournament Announcement
Dungeon Crawl Stone Soup 2026-01-11T20:04:32+00:00
Building a 1997 Quake PC: Benchmarking Quake
Fabien Sanglard 2026-01-12T00:00:00+00:00
Atlas of Borders. Walls, Migrations and Conflict in 70 Maps
We Make Money Not Art 2026-01-12T09:52:44+00:00
Disunion
Mark Bernstein 2026-01-12T13:00:52+00:00
Names
Mark Bernstein 2026-01-12T13:15:38+00:00
Questions
Mark Bernstein 2026-01-12T13:29:51+00:00
Building a 1997 Quake PC: Benchmarking Vquake
Fabien Sanglard 2026-01-13T00:00:00+00:00
Haiku Activity & Contract Report, December 2025
Haiku Project 2026-01-13T01:00:00+00:00
Building a 1997 Quake PC: Benchmarking GLquake
Fabien Sanglard 2026-01-14T00:00:00+00:00
HUD, History, and What’s Ahead
City of Yes 2026-01-14T14:02:53+00:00
Morgan Housel: Wealth is What You Have Minus What You Want
Farnam Street 2026-01-15T10:30:00+00:00
On the preparations before writing an essay
Escaping Flatland 2026-01-15T11:03:57+00:00
What making community compost means now
The Rot 2026-01-15T15:46:49+00:00
Is QSpy still cool? Let's play QuakeWorld!
Fabien Sanglard 2026-01-16T00:00:00+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-01-16T13:16:27+00:00
When “Just One More Lane” Runs Out of Road
City of Yes 2026-01-16T14:25:23+00:00
150 questions to fall in love.
The Curiosity Cabinet 2026-01-16T16:20:28+00:00
MACGUFFIN WORLD
Infinite Gossip 2026-01-17T04:36:10+00:00
Not only Nemesis and Mr X: immortal stalkers and chasing enemies in turn-based JRPGs [Updated Jan 2026]
Dark RPGs 2026-01-17T09:32:55+00:00
A Hedgehog in a Fox’s World: Paul Kingsnorth’s "Against the Machine"
Steelsnowflake 2026-01-17T14:09:17+00:00
Lorwyn Eclipsed Prerelease
Mediocre Magic 2026-01-17T21:51:00+00:00
The best version of my site so far...
daverupert.com 2026-01-18T14:29:00+00:00
On Canned Tomatoes (They Can Be Pretty Awesome)
Maybe Pizza? 2026-01-18T22:54:47+00:00
Jews and Words
Mark Bernstein 2026-01-19T14:35:38+00:00
La Société Automatique. Reminding us that the tech industry is based on myths as much as on science
We Make Money Not Art 2026-01-19T15:20:50+00:00
MBBP
Midwesterner 2026-01-20T12:03:04+00:00
Text is king
Experimental History 2026-01-20T18:55:47+00:00
FOSDEM 2026
Haiku Project 2026-01-22T07:00:00+00:00
[Outliers] Ray Kroc: How McDonald’s Took Over America
Farnam Street 2026-01-22T10:30:00+00:00
The Age of Assholes
City of Yes 2026-01-22T14:03:14+00:00
A Psalm for the Wild-Built
Mark Bernstein 2026-01-22T19:50:31+00:00
Lorwyn Eclipsed Clue Draft (WUBRG Drafting)
Mediocre Magic 2026-01-23T02:30:00+00:00
untitled
STML 2026-01-23T10:47:55+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-01-23T13:10:37+00:00
Oysters jubilee
Food is Stupid 2026-01-23T14:03:17+00:00
Omikron: The Nomad Soul
The Digital Antiquarian 2026-01-23T16:34:02+00:00
Video Game Thoughts Bonus Bag #9
The Bottom Feeder 2026-01-23T19:31:30+00:00
Ravnica Clue with Assorted Jumpstart Packs
Mediocre Magic 2026-01-24T02:22:00+00:00
The Value of Things
journal.stuffwithstuff.com 2026-01-24T08:00:00+00:00
Waiting for the power to go out
daverupert.com 2026-01-24T21:44:00+00:00
A beauty pageant, judged by a computer.
The Curiosity Cabinet 2026-01-25T13:31:07+00:00
Porting 100k lines from TypeScript to Rust using Claude Code in a month
Vjeux 2026-01-25T17:42:47+00:00
Lorwyn Eclipsed Draft (WUBRG Drafting)
Mediocre Magic 2026-01-26T02:27:00+00:00
I'm swearing off APIs entirely
daverupert.com 2026-01-26T06:27:00+00:00
I know your secret
Experimental History 2026-01-27T16:22:32+00:00
The Bullet That Missed
Mark Bernstein 2026-01-27T17:48:24+00:00
Ponder This Challenge - February 2026 - Blot-avoiding backgammon strategy
IBM Ponder This 2026-01-27T22:00:00+00:00
On political power
Escaping Flatland 2026-01-28T12:03:38+00:00
0.34 Tournament Page and Trunk Update
Dungeon Crawl Stone Soup 2026-01-29T03:23:59+00:00
Michael Ovitz: The Psychology of Power
Farnam Street 2026-01-29T10:30:00+00:00
Repost: Jerry's Apartment
City of Yes 2026-01-29T14:02:37+00:00
It's still what computers still can't do
Apperceptive by Sam 2026-01-29T21:06:22+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-01-30T13:18:28+00:00
A wedding in a peculiar venue.
The Curiosity Cabinet 2026-01-30T13:42:53+00:00
Honey bunion rings
Food is Stupid 2026-01-30T14:03:29+00:00
The Cookie Theory of Collective Action
Snack Stack 2026-01-30T21:52:40+00:00
Frepack Draft #(n+12) (WUBRG Drafting)
Mediocre Magic 2026-02-02T01:19:00+00:00
From $0 to $100 million in DonutSMP over the weekend
Vjeux 2026-02-02T03:52:33+00:00
The Gerrit code review iceberg, episode 5
Haiku Project 2026-02-03T12:30:00+00:00
Underrated ways to change the world, vol. II
Experimental History 2026-02-03T16:28:19+00:00
Launching The Rural Guaranteed Minimum Income Initiative
Coding Horror 2026-02-04T07:43:56+00:00
Write about the future you want
daverupert.com 2026-02-04T15:45:00+00:00
Let's compile Quake like it's 1997!
Fabien Sanglard 2026-02-05T00:00:00+00:00
> 203: I stand at the lip of a pouting valley—SPEAK TO ME!
Laura Olin 2026-02-05T13:04:03+00:00
0.34 “Doomed Geometries”
Dungeon Crawl Stone Soup 2026-02-06T02:33:27+00:00
Things that connect us to ourselves, and things that don't
Escaping Flatland 2026-02-06T11:54:11+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-02-06T13:12:29+00:00
A Visit to the Bentham Project at University College London (UCL)
Practical Ethics 2026-02-06T14:46:08+00:00
Ultima IX
The Digital Antiquarian 2026-02-06T17:09:05+00:00
You can just dig a hole.
The Rot 2026-02-07T22:53:25+00:00
Superb Snack History: The secret life of seven-layer dip
Snack Stack 2026-02-08T20:51:43+00:00
Magic Words
daverupert.com 2026-02-09T16:03:00+00:00
Chaos needs to read the whole card (WUBRG Drafting)
Mediocre Magic 2026-02-10T02:31:00+00:00
CHRISTOPHER'S STRONG LEGS
Infinite Gossip 2026-02-10T06:05:12+00:00
Nicolai Tangen: The $2 Trillion Mind
Farnam Street 2026-02-12T10:30:00+00:00
Gemini's Hypothetical Present
Jeff Kaufman 2026-02-12T13:00:00+00:00
Tokyo: The Megacity at Human Scale
City of Yes 2026-02-12T14:02:48+00:00
Zone2Source, a testing ground for art and ecology
We Make Money Not Art 2026-02-12T14:22:17+00:00
What Exact Products Do Games Sell, Two Case Studies
The Bottom Feeder 2026-02-12T19:00:18+00:00
Haiku Activity & Contract Report, January 2026
Haiku Project 2026-02-13T03:00:00+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-02-13T13:10:28+00:00
AI: New Frontiers
Mark Bernstein 2026-02-13T16:39:17+00:00
How Michael Abrash doubled Quake framerate
Fabien Sanglard 2026-02-14T00:00:00+00:00
The New Roman Empire: A History of Byzantium
Mark Bernstein 2026-02-14T02:13:48+00:00
Advertising for Love
The Curiosity Cabinet 2026-02-14T13:02:55+00:00
It’s In The Blood
Demon 2026-02-14T23:41:35+00:00
Text Posts from the Kids Group: 2025
Jeff Kaufman 2026-02-15T13:00:00+00:00
We need to continue to sing this song
gilest.org 2026-02-15T19:14:33+00:00
1998 Ebook!
The Digital Antiquarian 2026-02-16T13:32:30+00:00
Mental Health Chatbots: on Truth and Bullshit
Practical Ethics 2026-02-16T15:51:58+00:00
I swear the UFO is coming any minute
Experimental History 2026-02-17T16:15:11+00:00
Sizing Chaos
The Pudding 2026-02-18T06:00:00+00:00
The need to make art
Escaping Flatland 2026-02-18T10:42:17+00:00
Rethinking the Ethics and Politics of the Global Campaign Against Female Genital Cutting
Practical Ethics 2026-02-18T13:36:54+00:00
What is happening to writing?
Res Obscura 2026-02-18T14:52:04+00:00
[Outliers] Phil Knight: The Obsession That Built Nike
Farnam Street 2026-02-19T10:30:00+00:00
You May Already Be Canadian
Jeff Kaufman 2026-02-19T13:00:00+00:00
Everything’s in the Shitter
City of Yes 2026-02-19T15:02:47+00:00
New Book: Protecting Minds – The Right Against Mental Interference
Practical Ethics 2026-02-20T12:55:50+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-02-20T13:18:28+00:00
House-smoked tuna
Food is Stupid 2026-02-20T14:03:26+00:00
Gabriel Knight 3: Blood of the Sacred, Blood of the Damned
The Digital Antiquarian 2026-02-20T17:24:46+00:00
Chaos Commander Draft
Mediocre Magic 2026-02-20T20:56:00+00:00
Google Summer of Code 2026
Neovim 2026-02-21T00:00:00+00:00
Haiku to mentor interns in Google Summer of Code 2026
Haiku Project 2026-02-21T11:47:20+00:00
AI: Polite?
Mark Bernstein 2026-02-22T01:56:21+00:00
Storing Food
Jeff Kaufman 2026-02-22T13:00:00+00:00
Looking for a man who won't "ogle."
The Curiosity Cabinet 2026-02-22T14:15:40+00:00
Priority of idle hands
daverupert.com 2026-02-23T03:13:00+00:00
Smaller and dumber
daverupert.com 2026-02-23T05:33:00+00:00
Dollar Country 259: Country, Bluegrass, and Gospel from North Carolina
Dollar Country Newsletter & Radio Show 2026-02-23T20:08:35+00:00
The Secret History of Knocking on Wood
Res Obscura 2026-02-24T14:12:37+00:00
Reflecting on Self (human and AI)
ntoll.org 2026-02-24T18:00:00+00:00
0.34 Tournament Results
Dungeon Crawl Stone Soup 2026-02-25T01:07:54+00:00
Happy Map
The Pudding 2026-02-25T06:00:00+00:00
Be prepared for a cardtastic event…. a competition…. where even games may come true … !…
Muppe 2026-02-25T13:39:51+00:00
Inside the Mind of Robinhood Co-Founder Vlad Tenev
Farnam Street 2026-02-26T10:30:00+00:00
Getting a better sense for when you’re thinking well and when you’re faking it
Escaping Flatland 2026-02-26T11:39:21+00:00
Why “Plants Have Feelings Too” Is a Terrible Argument Against Veganism
Steelsnowflake 2026-02-26T12:40:13+00:00
Slaloming Towards Olympus
City of Yes 2026-02-26T14:59:23+00:00
Here's to the Polypropylene Makers
Jeff Kaufman 2026-02-27T13:00:00+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-02-27T13:20:29+00:00
Beefed corn
Food is Stupid 2026-02-27T14:01:03+00:00
‘It’s Physical, Not Intellectual’: The Ethics of Correcting Assumptions About Disability
Practical Ethics 2026-02-27T14:31:50+00:00
We The Bacteria. Notes Toward Biotic Architecture
We Make Money Not Art 2026-02-27T14:38:48+00:00
Make things OpenSAFELY, it makes things better
gilest.org 2026-02-27T15:42:55+00:00
Be prepared for a cardtastic event…. a competition…. where even games may come true … !…
Muppe 2026-02-28T02:06:41+00:00
WE ARE GOING LIVE WITH CARDWRIGHTS CARDS WILL BE BORN HOSTED BY:
Muppe 2026-02-28T02:06:53+00:00
Ponder This Challenge - March 2026 - Path game on a hole-riddled chessboard
IBM Ponder This 2026-02-28T22:00:00+00:00
Introducing and Deprecating WoFBench
Jeff Kaufman 2026-03-01T13:00:00+00:00
Balloons, bras, and body slams.
The Curiosity Cabinet 2026-03-01T14:06:42+00:00
Is Prostitution Just a Job?
Practical Ethics 2026-03-02T11:35:29+00:00
Can Pornography be Feminist in a Mass Market Economy?
Practical Ethics 2026-03-02T12:01:43+00:00
AI: afternoon, with Claude
Mark Bernstein 2026-03-02T17:43:17+00:00
⛵️ Painting with Rebelle 8 Pro
Nathan Youngman 2026-03-03T00:00:00+00:00
Public Listening: Jamie Lee's
Fujichia 2026-03-03T15:16:46+00:00
The one science reform we can all agree on, but we're too cowardly to do
Experimental History 2026-03-03T17:47:57+00:00
With Us or Against Us, Again
The Present Age 2026-03-03T23:21:35+00:00
💔 Animating with Moho 14.4
Nathan Youngman 2026-03-04T00:00:00+00:00
Dollar Country: The Missing Episodes (256, 257, 258)
Dollar Country Newsletter & Radio Show 2026-03-04T21:14:34+00:00
🎹 Learning to Play
Nathan Youngman 2026-03-05T00:00:00+00:00
[Outliers] J.W. Marriott: Building an Empire Without a Master Plan
Farnam Street 2026-03-05T10:30:00+00:00
> 204: At least he didn't get Earl
Laura Olin 2026-03-05T14:06:14+00:00
The Freedom of the City
City of Yes 2026-03-05T16:04:14+00:00
A Detailed Review of Like 8% of Mewgenics
The Bottom Feeder 2026-03-05T21:08:36+00:00
Haiku Activity & Contract Report, February 2026
Haiku Project 2026-03-06T01:30:00+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-03-06T12:11:23+00:00
Chaos drafts with Ooze (WUBRG Drafting)
Mediocre Magic 2026-03-06T13:27:00+00:00
A rudimentary paste
Food is Stupid 2026-03-06T14:03:52+00:00
The Mystery of Rennes-le-Château, Part 1: The Priest’s Treasure
The Digital Antiquarian 2026-03-06T16:50:42+00:00
hi hi! just wanted to pop in to say your guide to regency names is a godsend as an amateur regency historian and hopeful romance writer!! the tier system is perfect, and it’s made me feel better about all the characters I slap the name Mary onto. did you ever get around to doing some work on nicknames and/or accurate surnames? I’d love to hear of any primary sources you have for those two topics!
Ye Olde News 2026-03-06T19:51:12+00:00
Noem Reassigned to Made-up Position, WAR-slash-Epstein Files or Real Terrorism Concern, & Panic at the Pump
Kareem Abdul-Jabbar 2026-03-07T11:00:26+00:00
1500 Regency Era Last Names
Ye Olde News 2026-03-07T16:30:21+00:00
Chore Standards
Jeff Kaufman 2026-03-08T13:00:00+00:00
France is Bacon, Organic Idiocy and the Chinese Room
ntoll.org 2026-03-08T21:00:00+00:00
On the problem of landscape tarp
The Rot 2026-03-09T01:01:28+00:00
untitled
text-mode.org 2026-03-09T14:55:16+00:00
Dollar Country 260: Waltzes & 2-Steps (All Cajun no. 3)
Dollar Country Newsletter & Radio Show 2026-03-09T16:02:44+00:00
Billionaires Influence Elections, Boys With Pocketfuls of Cash, & China's Nuke 'em All Test?
Kareem Abdul-Jabbar 2026-03-10T10:03:23+00:00
Conflicted on Ramsey
Jeff Kaufman 2026-03-10T13:00:00+00:00
Schrödinger's War
The Present Age 2026-03-10T15:35:10+00:00
Some relationships deepen when you tell the truth and some end
Escaping Flatland 2026-03-11T08:27:40+00:00
How Many Parking Permits?
Jeff Kaufman 2026-03-11T13:00:00+00:00
Design Beyond the Human. Transdisciplinary Conversations about the Planet
We Make Money Not Art 2026-03-11T14:04:00+00:00
Why I (Mostly) Stopped Posting To Youtube
Dollar Country Newsletter & Radio Show 2026-03-11T15:40:24+00:00
Queen's Wish: A Portmortem Of Mixed Success
The Bottom Feeder 2026-03-11T20:32:34+00:00
untitled
STML 2026-03-12T09:05:19+00:00
Brookfield CEO Connor Teskey: AI Infrastructure, Data Centers, and the Future of Investing
Farnam Street 2026-03-12T09:30:00+00:00
urgent mutual aid request for immigrant family
Muppe 2026-03-12T18:04:25+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-03-13T12:12:49+00:00
Paved with Gold: The Hidden Costs of Free Transportation
City of Yes 2026-03-13T13:01:08+00:00
The Patriotic Press
The Present Age 2026-03-13T17:46:45+00:00
Double Standard at the Top, When One Tremor Ripples Across the Globe, & The Quiet Crisis in America’s Living Rooms
Kareem Abdul-Jabbar 2026-03-14T10:02:44+00:00
Photoshops Without Explication
Fujichia 2026-03-14T21:19:57+00:00
Foundations Draft #2 (WUBRG Drafting)
Mediocre Magic 2026-03-15T03:26:00+00:00
0.34.1 Bugfix Release
Dungeon Crawl Stone Soup 2026-03-15T04:31:16+00:00
One hundred curl graphs
daniel.haxx.se 2026-03-15T10:42:45+00:00
They'll Go First
The Present Age 2026-03-16T22:08:08+00:00
Wreckage of Iran Air Flight 655, 1988.
STML 2026-03-17T08:55:39+00:00
Robert Breer, Rug (1969)
STML 2026-03-17T09:10:55+00:00
Zero Evidence Against Powell, When Asylum Isn't Asylum, & The Mayor Who Won't Bend
Kareem Abdul-Jabbar 2026-03-17T10:02:55+00:00
Illustration of a 1982 Perfect Writer. “The manual accompanying Perfect Writer came with a fanciful…
STML 2026-03-17T10:23:20+00:00
Help I'm being persecuted
Experimental History 2026-03-17T13:33:13+00:00
COMPOST AFTER READING is officially OUT!
The Rot 2026-03-17T18:09:44+00:00
untitled
STML 2026-03-17T18:55:12+00:00
The Landscape Architecture of Auroras on Demand
BLDGBLOG 2026-03-18T18:23:31+00:00
"What if We Didn't Suck?"
The Present Age 2026-03-18T19:16:44+00:00
A Journey Through Infertility
The Pudding 2026-03-19T05:00:00+00:00
[Outliers] Harrison McCain: How to Create Demand for Something Nobody Wants
Farnam Street 2026-03-19T09:30:00+00:00
Disorder in the Liberal City
City of Yes 2026-03-19T13:36:46+00:00
International CGT in Japan, Day One Talks
Combinatorial Game Theory 2026-03-19T20:36:00+00:00
Know where your codes are
gilest.org 2026-03-19T21:48:48+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-03-20T12:26:29+00:00
People are not friction
daverupert.com 2026-03-20T15:54:00+00:00
The Mystery of Rennes-le-Château, Part 2: Secret Codes and Hidden Messages
The Digital Antiquarian 2026-03-20T18:09:52+00:00
International CGT in Japan, Day Two
Combinatorial Game Theory 2026-03-20T22:43:00+00:00
Putin’s Bubble Gets Smaller, Bondi Subpoenaed, & Cool It on the Judge Attacks, Says Chief Justice
Kareem Abdul-Jabbar 2026-03-21T10:02:50+00:00
bye bye RTMP
daniel.haxx.se 2026-03-21T14:06:12+00:00
The most deranged maniacs invading your world as Red Phantoms in Cradle of Nightmare
Dark RPGs 2026-03-21T19:17:50+00:00
Wonderful Toolchain project update - March 2026
Posts on asie's blog 2026-03-22T00:00:00+00:00
NTLM and SMB go opt-in
daniel.haxx.se 2026-03-22T11:41:09+00:00
International CGT in Japan, Day Three
Combinatorial Game Theory 2026-03-22T14:49:00+00:00
Contextual Collapse
BLDGBLOG 2026-03-22T16:59:13+00:00
Witness at the End of Time
Wild Information 2026-03-22T17:45:30+00:00
Nerd Deep Housecleaning Update, Part 1
The Bottom Feeder 2026-03-22T19:54:02+00:00
International CGT in Japan, Day Four
Combinatorial Game Theory 2026-03-23T08:59:00+00:00
Contra Dances Should Avoid Saturdays
Jeff Kaufman 2026-03-23T13:00:00+00:00
Differently free
Escaping Flatland 2026-03-23T17:04:07+00:00
Sloppelgängers
The Present Age 2026-03-23T19:21:55+00:00
Dollar Country 261: On A Highway Heading South
Dollar Country Newsletter & Radio Show 2026-03-24T03:18:41+00:00
When a Coin Becomes a Message, A War That Could Reshape the Global Economy, & Iran’s Violent Message to Its Citizens
Kareem Abdul-Jabbar 2026-03-24T10:02:47+00:00
A Spanish-Speaking Robot in my Pocket
Jeff Kaufman 2026-03-24T13:00:00+00:00
A New DCSS Server for South America
Dungeon Crawl Stone Soup 2026-03-25T00:30:50+00:00
One hundred weirdo emails
daniel.haxx.se 2026-03-25T08:05:41+00:00
AI: Physics
Mark Bernstein 2026-03-25T12:46:25+00:00
Label By Usable Volume
Jeff Kaufman 2026-03-25T13:00:00+00:00
collapse: data.models.worlds. What role does technology play in the intensifying state of crisis shaping our world?
We Make Money Not Art 2026-03-25T14:00:53+00:00
About your last post, where you wished you could code more to make an accurate regency name generator — just wondering if you’ve ever heard of perchance.org? It’s designed for making random generators with zero coding necessary, unless you want the generator to look pretty.
Ye Olde News 2026-03-25T17:19:01+00:00
A selection of strange and cryptic personal ads from The New York Herald, 1850s-1870s. 18/?
Ye Olde News 2026-03-25T18:00:39+00:00
New Martian Writing
Idle Words 2026-03-26T07:25:00+00:00
Joe Liemandt: Alpha School and the Future of Education
Farnam Street 2026-03-26T09:30:00+00:00
Don’t trust, verify
daniel.haxx.se 2026-03-26T10:09:07+00:00
When “Single-Family” Isn’t Family-Friendly
City of Yes 2026-03-26T13:03:26+00:00
A selection of strange and cryptic personal ads from The New York Herald, 1860s to 1870s. 17/?
Ye Olde News 2026-03-26T18:30:17+00:00
The IOC's New Policy Isn't Really a Trans Story
The Present Age 2026-03-26T21:55:21+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-03-27T12:32:52+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-03-27T12:36:53+00:00
Panda cotta
Food is Stupid 2026-03-27T13:22:57+00:00
AI and the human voice
gilest.org 2026-03-27T16:53:54+00:00
A selection of strange and cryptic personal ads from The New York Herald, 1860s to 1890s. 16/?
Ye Olde News 2026-03-27T18:30:27+00:00
How Social Media Became the New Tobacco, The Promise We Broke, & When Public Health Goes Quiet
Kareem Abdul-Jabbar 2026-03-28T10:01:48+00:00
Artemis II Is Not Safe to Fly
Idle Words 2026-03-28T12:29:00+00:00
A selection of strange and cryptic personal ads from The New York Herald, 1860s to 1890s. 15/?
Ye Olde News 2026-03-28T18:30:55+00:00
NYC Draft: Spiders, Turtles, Turtles
Mediocre Magic 2026-03-28T19:28:00+00:00
A selection of strange and cryptic personal ads from The New York Herald, 1860s to 1890s. 14/?
Ye Olde News 2026-03-29T18:30:48+00:00
A Guide to Common Regency-Era Nicknames
Ye Olde News 2026-03-30T19:07:55+00:00
They Know What "Wrong Place, Wrong Time" Means
The Present Age 2026-03-30T20:49:49+00:00
The Houthis Didn’t Suddenly Materialize, Where Accountability Goes to Die, Clowns Become Candidates, & History Has Receipts...Legal Spin Doesn’t.
Kareem Abdul-Jabbar 2026-03-31T10:02:41+00:00
Photos from my time in Iran, 2017
Res Obscura 2026-03-31T12:50:11+00:00
Minotaur Eyes
Steelsnowflake 2026-03-31T13:15:06+00:00
Infinite midwit
Experimental History 2026-03-31T16:05:29+00:00
Please my sneeze.
Muppe 2026-03-31T17:59:27+00:00
Ponder This Challenge - April 2026 - The Unlabeled Clock
IBM Ponder This 2026-03-31T22:00:00+00:00
You Don't Have To Be A Fool To Be A Fool.
Discworld MUD Dev Blog 2026-04-01T05:48:08+00:00
The Mystery of Rennes-le-Château, Part 3: A Secret History
The Digital Antiquarian 2026-04-01T07:05:28+00:00
Some reflections on Elena Conis’ lecture “Contextualising the Modern Era of Vaccination”
Practical Ethics 2026-04-01T09:18:57+00:00
Days are enormous
Escaping Flatland 2026-04-01T11:14:48+00:00
@rebellum who asked about this post - Question: what regions does this cover? You mention “the…
Ye Olde News 2026-04-01T20:10:11+00:00
I just really hate the word “fandom”. It’s just a portmanteau of “fan” and “random”. It sounds like some desperate attempt to be quirky and different. Plus, the word “fanbase” already exists.
Ye Olde News 2026-04-01T20:50:23+00:00
Epic Hero #2, Dungeon of Derojhen: Final Judgement
Renga in Blue 2026-04-02T02:06:55+00:00
More, and More Extensive, Supply Chain Attacks
Jeff Kaufman 2026-04-02T13:00:00+00:00
> 205: Something hopeful to show the world you hoped?
Laura Olin 2026-04-02T13:05:55+00:00
Picture Perfect. Challenging dominant Western beauty standards
We Make Money Not Art 2026-04-02T13:59:19+00:00
Remote Isn’t Working
City of Yes 2026-04-02T15:43:55+00:00
Useful
The Present Age 2026-04-02T20:46:31+00:00
Skull Cave: The Mystery of the Mazes
Renga in Blue 2026-04-02T22:31:32+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-04-03T12:16:38+00:00
Reconsider Challenging Sessions at Weekends
Jeff Kaufman 2026-04-03T13:00:00+00:00
Vegan ortolan
Food is Stupid 2026-04-03T13:04:03+00:00
A sleep aid
Interconnected 2026-04-03T17:14:00+00:00
Bondi Fired, Courage After Retirement, & A Library So Fancy "It" Forgot the Books
Kareem Abdul-Jabbar 2026-04-04T10:01:34+00:00
Chicken-Free Egg Whites
Jeff Kaufman 2026-04-04T13:00:00+00:00
Before I go: People like it when other people make things
daverupert.com 2026-04-04T17:00:00+00:00
Ozempic dreams
daverupert.com 2026-04-04T19:18:00+00:00
Songsoo Kim’s Rapini Doenjang Guk (Flowering Spring Greens Soup)
Vittles 2026-04-05T08:11:51+00:00
Spring Soups: A Vittles Cooking Supplement
Vittles 2026-04-05T08:21:52+00:00
Unsweetened Whipped Cream
Jeff Kaufman 2026-04-05T13:00:00+00:00
Listening for love or lust.
The Curiosity Cabinet 2026-04-06T01:50:16+00:00
The Phantom Ship / Yuureisen (1982)
Renga in Blue 2026-04-06T02:49:41+00:00
Nomic Coding Game
nklein software 2026-04-06T04:09:29+00:00
THE END OF THE PARTY
Infinite Gossip 2026-04-06T07:52:35+00:00
Destruction of Infrastructure for the Impact on Civilians is Manifestly Illegal
Jeff Kaufman 2026-04-06T13:00:00+00:00
Community Iftar
Practical Ethics 2026-04-06T17:18:34+00:00
MISSING PERSON PLEASE SHARE
Muppe 2026-04-06T23:31:13+00:00
Hospitality has a wage theft problem
Vittles 2026-04-07T07:39:09+00:00
Doctor Breaks Silence on Trump’s Health, A Cuba Policy Built on Painm & A Championship Won the UCLA Way
Kareem Abdul-Jabbar 2026-04-07T10:01:35+00:00
Contra Dance Piano Teaching Videos
Jeff Kaufman 2026-04-07T13:00:00+00:00
Inverted themes with light-dark()
daverupert.com 2026-04-07T15:31:00+00:00
Curiosity Rover’s damaged wheels after 13 years, or 7.25 Martian years of service on the Red Planet….
STML 2026-04-08T19:08:43+00:00
Mario Harik: Playing to Win
Farnam Street 2026-04-09T09:30:00+00:00
Arrested Development
City of Yes 2026-04-09T14:00:53+00:00
Take Him Literally
The Present Age 2026-04-09T15:43:02+00:00
Message From Space No Talking
Fujichia 2026-04-09T22:15:54+00:00
Chaos hears the Call of the Ring (WUBRG Drafting)
Mediocre Magic 2026-04-10T01:25:00+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-04-10T12:14:29+00:00
At Dalston’s Ridley Road Indoor Market, a Community Fights for Its Survival
Vittles 2026-04-10T13:03:25+00:00
Baked alphabet
Food is Stupid 2026-04-10T13:04:14+00:00
Seemingly in cahoots
gilest.org 2026-04-10T13:13:03+00:00
mist is now open source and looking for interop
Interconnected 2026-04-10T16:35:00+00:00
When Both Sides Declare Victory, Who Drops Nearly a Billion Before a Ceasefire, & Typhus Is Back in L.A.
Kareem Abdul-Jabbar 2026-04-11T10:01:41+00:00
Iterating the potatoes
gilest.org 2026-04-11T14:10:54+00:00
The Phantom Ship / Yuureisen: Mounds of Verbs
Renga in Blue 2026-04-11T20:20:03+00:00
Sprouts 2026 Summaries
Combinatorial Game Theory 2026-04-12T17:34:00+00:00
Sprouts 2026 Afterthoughts
Combinatorial Game Theory 2026-04-12T18:18:00+00:00
Good news from Hungary
Crooked Timber 2026-04-13T03:45:06+00:00
Combining Rate and Instructions to Create Beautiful Madness
a blog by biggiemac42 2026-04-13T06:39:10+00:00
How Hurricane Melissa Affected Food and Farming in Jamaica
Vittles 2026-04-13T07:45:38+00:00
Happy Birthday, Dorothy Lynch
Midwesterner 2026-04-13T11:02:41+00:00
When moving fast, talking is the first thing to break
daverupert.com 2026-04-13T15:10:00+00:00
Haiku Activity & Contract Report, March 2026 (ft. ARM64)
Haiku Project 2026-04-14T02:00:00+00:00
How to walk through walls
Escaping Flatland 2026-04-14T07:22:57+00:00
Pluto’s Hillary Mountains
STML 2026-04-14T08:52:59+00:00
How many babies do we want? How many will we have?
Crooked Timber 2026-04-14T09:17:32+00:00
A Two‑Week Sprint Into a Forty‑Year Problem, When Political Theater Meets a 2,000‑Year‑Old Institution, & The Real Reason Stress Keeps Winning
Kareem Abdul-Jabbar 2026-04-14T10:02:14+00:00
History Nerd Bucket List: The Jenny Geddes Stool
Crooked Timber 2026-04-14T11:06:13+00:00
The "Foremost War Skeptic"
The Present Age 2026-04-14T13:44:44+00:00
Soil Turn—A Field Guide to Artistic Earthly Engagements
We Make Money Not Art 2026-04-14T14:21:52+00:00
Nothing ever dies. It merely becomes embarrassing.
Experimental History 2026-04-14T16:16:42+00:00
I Will Never Respect A Website
Ed Zitron's Where's Your Ed At 2026-04-14T16:22:59+00:00
How to use your compost now that it's officially spring
The Rot 2026-04-14T16:54:20+00:00
Ideas for Mickey
Fujichia 2026-04-14T17:46:32+00:00
I’ve just spent about two hours reading through ALL your Rachel & Co. posts. Thank you for sharing all these wonderful letters!
Ye Olde News 2026-04-14T19:45:30+00:00
The Phantom Ship / Yuureisen: Cursed Defiler
Renga in Blue 2026-04-15T03:38:06+00:00
Music break: Baba Yetu
Crooked Timber 2026-04-15T09:12:06+00:00
The Dorothy Lynch Red Beer
Midwesterner 2026-04-15T11:01:16+00:00
The chewy, nutty snack from Isfahan
Snack Stack 2026-04-15T12:17:00+00:00
‘Once Queensway Market is gone, there won’t be anything like it left.’
Vittles 2026-04-15T14:01:55+00:00
I don't want a screenshot of your Claude conversation
daverupert.com 2026-04-15T15:17:00+00:00
A Real Delivery
The Present Age 2026-04-15T16:42:42+00:00
This week in Rachel & Co. history…
Ye Olde News 2026-04-15T20:09:56+00:00
EsoNatLangs Bring the Complexity of Natural Language into Code
esoteric.codes 2026-04-16T05:22:00+00:00
EP-40 Riddim cheat sheet
Spongefile 2026-04-16T18:00:23+00:00
Global science equity – towards solutions
Crooked Timber 2026-04-17T07:38:21+00:00
Where to Eat Outside of London This Weekend
Vittles 2026-04-17T09:18:36+00:00
Dorothy Lynch Everything
Midwesterner 2026-04-17T11:03:41+00:00
The Work of Community
City of Yes 2026-04-17T12:15:43+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-04-17T12:26:35+00:00
Limburger Bay Biscuits
Food is Stupid 2026-04-17T13:03:16+00:00
The Mystery of Rennes-le-Château, Part 4: Non-Fiction Meets Fiction
The Digital Antiquarian 2026-04-17T16:12:17+00:00
Premium: The Hater's Guide to Private Credit
Ed Zitron's Where's Your Ed At 2026-04-17T16:57:30+00:00
Are those what my grand mother called leg-of-mutton sleeves?
Ye Olde News 2026-04-17T17:36:35+00:00
The Palace Says It Cares, But Actions Tell a Different Story, Tax Day Shows the Gap, & Trump vs. The Pope Who Doesn’t Need His Approval
Kareem Abdul-Jabbar 2026-04-18T10:01:31+00:00
Fifteen Years Aboard
Jeff Kaufman 2026-04-18T13:00:00+00:00
Headless everything for personal AI
Interconnected 2026-04-18T17:00:00+00:00
Bobby, I hardly Knew Ye
Crooked Timber 2026-04-19T03:40:07+00:00
MixedHTML Mode for Emacs
Jeff Kaufman 2026-04-19T13:00:00+00:00
Marry your boss?
The Curiosity Cabinet 2026-04-19T13:19:40+00:00
Sunday photoblogging: Pézenas street
Crooked Timber 2026-04-19T19:56:35+00:00
Eat This, Not That
Vittles 2026-04-20T07:21:21+00:00
Archive Dive: Still Renting After All These Years
The Deleted Scenes 2026-04-20T12:55:46+00:00
Exclusive: Microsoft To Shift GitHub Copilot Users To Token-Based Billing, Tighten Rate Limits
Ed Zitron's Where's Your Ed At 2026-04-20T17:11:58+00:00
Dollar Country 262: To Tough To Die
Dollar Country Newsletter & Radio Show 2026-04-20T17:17:58+00:00
Thank You For Being a Friend
Coding Horror 2026-04-20T17:21:00+00:00
Occasional paper: Inconstant moon
Crooked Timber 2026-04-20T21:46:33+00:00
OnionWars
The Present Age 2026-04-20T22:30:52+00:00
Free Newsletter Tuesday: Midterm Panic, What's Up With the California Democratic Party, & The $8 Billion Machine Sprint
Kareem Abdul-Jabbar 2026-04-21T10:02:58+00:00
AI AI Captain! Der Wienerschnitzel Edition
The Deleted Scenes 2026-04-21T12:55:58+00:00
Automated Deanonymization is Here
Jeff Kaufman 2026-04-21T13:00:00+00:00
Courier: real-time messaging for ESP32 with batteries included (new library)
Interconnected 2026-04-21T15:25:00+00:00
Four Horsemen of the AIpocalypse
Ed Zitron's Where's Your Ed At 2026-04-21T16:28:59+00:00
My Wikipedia Edits
Fujichia 2026-04-21T17:32:06+00:00
10,000-watt GPU meet 40-watt lump of meat
daverupert.com 2026-04-21T19:36:00+00:00
[UPDATED] News: Anthropic (Briefly) Removes Claude Code From $20-A-Month "Pro" Subscription Plan For New Users
Ed Zitron's Where's Your Ed At 2026-04-21T22:44:29+00:00
Chaos Clue Draft #2 (WUBRG Drafting)
Mediocre Magic 2026-04-22T00:57:00+00:00
Greg Brockman: Inside the 72 Hours That Almost Killed OpenAI
Farnam Street 2026-04-22T07:41:08+00:00
Nick Bramham’s Spanakorizo
Vittles 2026-04-22T07:53:06+00:00
untitled
STML 2026-04-22T10:14:48+00:00
High-Quality Chaos
daniel.haxx.se 2026-04-22T11:44:40+00:00
Seeing Red
The Deleted Scenes 2026-04-22T12:55:31+00:00
Your Supplies Probably Won't Be Stolen in a Disaster
Jeff Kaufman 2026-04-22T13:00:00+00:00
The handmade beauty of Machine Age data visualizations
Res Obscura 2026-04-22T13:05:46+00:00
The Scolding
The Present Age 2026-04-22T16:18:08+00:00
[Updated] Exclusive: Microsoft Moving All GitHub Copilot Subscribers To Token-Based Billing In June
Ed Zitron's Where's Your Ed At 2026-04-22T17:24:17+00:00
When The Rubber Meets The Road
The Deleted Scenes 2026-04-23T12:55:53+00:00
Sanctuary Suburbs
City of Yes 2026-04-23T13:01:35+00:00
On Reinforcing Cynicism in the Academy
Crooked Timber 2026-04-24T07:43:41+00:00
Six Unexpectedly Exceptional Breakfasts
Vittles 2026-04-24T10:31:47+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-04-24T12:24:39+00:00
New and Old #263
The Deleted Scenes 2026-04-24T12:55:48+00:00
Contra Events Pairing Callers By Age?
Jeff Kaufman 2026-04-24T13:00:00+00:00
Weird Dutch pizza
Food is Stupid 2026-04-24T13:03:27+00:00
Premium: How OpenAI Kills Oracle
Ed Zitron's Where's Your Ed At 2026-04-24T16:40:45+00:00
The “Freakout” and the “Abyss”
The Present Age 2026-04-24T17:01:02+00:00
The Wind in the Willows and reading out loud
Interconnected 2026-04-24T17:56:00+00:00
Dismissing Excellence in the Highest Court, The Brief Life of a MAGA Secretary, & Stagflation 2.0
Kareem Abdul-Jabbar 2026-04-25T10:02:10+00:00
It took me far too long to realize that the final word was “blond” as in blonde lace, and Lady…
Ye Olde News 2026-04-25T18:35:16+00:00
The Phantom Ship / Yuureisen: Say Amen
Renga in Blue 2026-04-26T05:36:51+00:00
Chatty Chatty Change Change
Discworld MUD Dev Blog 2026-04-26T06:20:58+00:00
Sunday photoblogging: l’Abbaye de Valmagne
Crooked Timber 2026-04-26T07:41:01+00:00
Dot’s Home thoughts
The Virtual Moose 2026-04-26T12:55:42+00:00
An excerpt from the trial of Elinor Crane, who was arrested in Middlesex in 1693 on suspicion of…
Ye Olde News 2026-04-26T15:11:09+00:00
"A 50-year-old sociopath?"
The Curiosity Cabinet 2026-04-26T18:41:54+00:00
Patrick: An Illustrated Essay
Vittles 2026-04-27T08:01:07+00:00
Patrick
Vittles 2026-04-27T08:06:18+00:00
Thoughts about making a career as a writer
Escaping Flatland 2026-04-27T08:26:41+00:00
A museum about museums
gilest.org 2026-04-27T11:16:41+00:00
First Train To Clarksburg?
The Deleted Scenes 2026-04-27T12:55:50+00:00
Contra Binder on far-UVC and filtration
Jeff Kaufman 2026-04-27T13:00:00+00:00
vigilance
Weird Fucking Games 2026-04-27T16:11:55+00:00
A Bad Look
The Present Age 2026-04-27T18:09:12+00:00
Allotment engineering
gilest.org 2026-04-28T06:58:36+00:00
The Illusion of Security at the Washington Hilton, MTG Keeps Sounding the Alarm, & Vatican Dinners and Venture Capital
Kareem Abdul-Jabbar 2026-04-28T10:03:26+00:00
Occasional paper: Blue Angels, Devil Hands
Crooked Timber 2026-04-28T11:03:08+00:00
Friction and Reactionary Politics
The Deleted Scenes 2026-04-28T12:26:11+00:00
Interview As Funeral Cone
Fujichia 2026-04-28T15:46:29+00:00
AI's Economics Don't Make Sense [Ad Free]
Ed Zitron's Where's Your Ed At 2026-04-28T16:33:46+00:00
AI's Economics Don't Make Sense
Ed Zitron's Where's Your Ed At 2026-04-28T16:35:07+00:00
The 3rd Annual Blog Post Competition, Extravaganza, and Jamboree
Experimental History 2026-04-28T17:48:50+00:00
OpenAI Projects ChatGPT Plus subscriptions to drop by 80% from 44 Million in 2025 to 9 Million In 2026, Made Up Using Cheaper Subscriptions (Somehow)
Ed Zitron's Where's Your Ed At 2026-04-28T22:40:34+00:00
How I make a microbe shirt
The Rot 2026-04-28T23:54:17+00:00
curl 8.20.0
daniel.haxx.se 2026-04-29T06:27:01+00:00
With A Capital T That's Next To S Which Stands For Sky(scraper)
The Deleted Scenes 2026-04-29T12:26:05+00:00
Are "Vintage LLMs" the start of a new humanistic field?
Res Obscura 2026-04-29T12:45:56+00:00
Let Kids Keep More Productivity Gains
Jeff Kaufman 2026-04-29T13:00:00+00:00
PS5’s latest DRM fiasco appears to be not as bad as first thought, but some official communication from Sony would be great
PS5 – Destructoid 2026-04-29T15:29:11+00:00
A.P.E
Weird Fucking Games 2026-04-29T16:11:55+00:00
We need RSS for sharing abundant vibe-coded apps
Interconnected 2026-04-29T17:58:00+00:00
St. Andrew’s Adventure (1983)
Renga in Blue 2026-04-29T20:23:52+00:00
Inspired
daniel.haxx.se 2026-04-30T06:49:47+00:00
Approaching zero bugs?
daniel.haxx.se 2026-04-30T08:08:34+00:00
Who Binds You?
The Deleted Scenes 2026-04-30T12:55:58+00:00
Against In-Duct UV
Jeff Kaufman 2026-04-30T13:00:00+00:00
The Plaza and the Parking Lot
City of Yes 2026-04-30T13:02:54+00:00
yeoldenews: yeoldenews: yeoldenews: In April of 1896 Will...
Ye Olde News 2026-04-30T17:18:10+00:00
Ponder This Challenge - May 2026 - The Powers of a Binary Matrix
IBM Ponder This 2026-05-01T06:00:00+00:00
Haiku to mentor 3 students in Google Summer of Code 2026
Haiku Project 2026-05-01T08:00:00+00:00
Beyond the Hype at London’s Newest Viral Sandwich Spot
Vittles 2026-05-01T09:54:25+00:00
New and Old #264
The Deleted Scenes 2026-05-01T12:55:49+00:00
Filthy soda
Food is Stupid 2026-05-01T13:03:06+00:00
The First Amendment
The Present Age 2026-05-01T14:57:39+00:00
Escape From Sparta (1983)
Renga in Blue 2026-05-01T15:08:31+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-05-01T15:36:46+00:00
The Mystery of Rennes-le-Château, Part 5: The Man Behind the Curtain
The Digital Antiquarian 2026-05-01T15:46:03+00:00
Easter Egg
Weird Fucking Games 2026-05-01T16:49:06+00:00
Behind the Scenes of London's Most Influential Restaurant Group w/ Songsoo Kim
Vittles 2026-05-02T09:32:56+00:00
More Than Just a Map, The Art of the Endless Distraction, & How Much Is a Dream Worth?
Kareem Abdul-Jabbar 2026-05-02T10:01:14+00:00
A New DCSS Server for the US West Coast
Dungeon Crawl Stone Soup 2026-05-02T19:25:21+00:00
Sunday photoblogging: Canigou and cherry trees
Crooked Timber 2026-05-03T06:30:10+00:00
[GSoC 2026] Modernizing Haiku’s Bluetooth stack: Implementing support for HFP profile
Haiku Project 2026-05-03T06:37:41+00:00
[GSoC 2026] Bluetooth: HCI Improvements & HID Profile | Haiku Project
Haiku Project 2026-05-03T13:46:03+00:00
Text Adventures Still Rule in the Year 2026
The Virtual Moose 2026-05-03T14:18:54+00:00
Horror House (1983)
Renga in Blue 2026-05-03T15:17:45+00:00
PARA//LLAX
Weird Fucking Games 2026-05-03T16:49:05+00:00
The duality of language models in the browser
daverupert.com 2026-05-04T00:33:00+00:00
Comparisons as Predictable as the Sunrise
The Pudding 2026-05-04T05:00:00+00:00
The history of London's squat cafes
Vittles 2026-05-04T08:04:20+00:00
"Urbanist Sprawl" Revisited
The Deleted Scenes 2026-05-04T12:55:58+00:00
Alarming Scheduling
Jeff Kaufman 2026-05-04T13:00:00+00:00
Dollar Country 263: Alabama, Georgia, & Mississippi
Dollar Country Newsletter & Radio Show 2026-05-04T14:03:08+00:00
Premium: The AI Compute Demand Story Is A Lie
Ed Zitron's Where's Your Ed At 2026-05-04T14:09:22+00:00
Meandering Along the Alabama River
FYFD 2026-05-04T15:00:00+00:00
PRISM: The T100 Version
Renga in Blue 2026-05-04T20:24:24+00:00
Perfect Tides: Station to Station thoughts
The Virtual Moose 2026-05-04T21:23:07+00:00
Retaliation is Not a Strategy, The Cienfuegos Ghost, & The Silence of NASA
Kareem Abdul-Jabbar 2026-05-05T10:02:58+00:00
Ghost Of The Highways
The Deleted Scenes 2026-05-05T12:55:26+00:00
Don’t Fall for the Tucker Carlson Apology Tour
The Present Age 2026-05-05T13:26:49+00:00
Vibe Check №42
daverupert.com 2026-05-05T13:41:00+00:00
Fluids Can Fracture
FYFD 2026-05-05T15:00:00+00:00
When Should we Argue?
Practical Ethics 2026-05-05T15:20:03+00:00
One Weird Trick
Fujichia 2026-05-05T15:42:38+00:00
Made a Flickgame
The Virtual Moose 2026-05-05T15:50:03+00:00
Same Game, Different Music
Weird Fucking Games 2026-05-05T16:49:05+00:00
Theros: Face the Hydra (WUBRG Sealed)
Mediocre Magic 2026-05-05T17:10:00+00:00
The Rise and Fall and Rise Again of the American Bald Eagle
Steelsnowflake 2026-05-05T20:03:41+00:00
Warp Door's April 2026 Roundup
Warp Door 2026-05-06T02:22:33+00:00
The world reveals itself to those who travel by foot
Escaping Flatland 2026-05-06T09:10:51+00:00
Eight People, One Hob
Vittles 2026-05-06T10:27:09+00:00
Buffet Chronicles: Eat Like A Mongol?
The Deleted Scenes 2026-05-06T12:55:31+00:00
Plucking Droplets
FYFD 2026-05-06T15:00:00+00:00
Am I Meant To Be Impressed?
Ed Zitron's Where's Your Ed At 2026-05-06T15:13:07+00:00
hey, i love your rachel and co project. every few weeks i find myself coming back and rereading some of the posts. one thing i was wondering is how aunt gussie is related to everyone is she rachel’s dad’s sister?
Ye Olde News 2026-05-06T19:13:30+00:00
My grand mother told me her mother told her of doing washing for rich ppl at a place called Sylvan Beach and ironing leg-of-mutton sleeves with solid metal sad irons.
Ye Olde News 2026-05-06T19:20:20+00:00
Hopscotch (FeatureKreep)
Warp Door 2026-05-07T00:20:32+00:00
Winston Weinberg: Speed, Stress, and Better Decisions
Farnam Street 2026-05-07T09:55:00+00:00
untitled
text-mode.org 2026-05-07T10:24:07+00:00
untitled
text-mode.org 2026-05-07T10:42:33+00:00
untitled
text-mode.org 2026-05-07T10:48:35+00:00
How to Ranch-Wash Anything
Midwesterner 2026-05-07T11:01:00+00:00
The Arrival (Edward4hands)
Warp Door 2026-05-07T11:30:13+00:00
The Curious Early D.C. Suburbs, Wheaton, Maryland Edition
The Deleted Scenes 2026-05-07T12:55:55+00:00
The End of Urban Renewal
City of Yes 2026-05-07T13:02:46+00:00
Inside an Ear
FYFD 2026-05-07T15:00:00+00:00
Radio Galaxy
Weird Fucking Games 2026-05-07T18:49:05+00:00
Chthosis (Mathias Waltz)
Warp Door 2026-05-08T00:21:22+00:00
[GSoC 2026] Expanding the functionality of the Haiku Devices Application
Haiku Project 2026-05-08T02:10:41+00:00
I want my MTV
Interconnected 2026-05-08T02:51:00+00:00
VAPOR GALLERY (Liam Kenna)
Warp Door 2026-05-08T03:21:44+00:00
Same Impala?
Vittles 2026-05-08T09:32:52+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-05-08T12:40:29+00:00
New and Old #265
The Deleted Scenes 2026-05-08T12:55:24+00:00
AI is Breaking Two Vulnerability Cultures
Jeff Kaufman 2026-05-08T13:00:00+00:00
We put on a show
gilest.org 2026-05-08T13:30:31+00:00
Uncertainty as a field of action. An interview with Amanda Masha Caminals
We Make Money Not Art 2026-05-08T14:11:59+00:00
Premium: AI's Circular Psychosis
Ed Zitron's Where's Your Ed At 2026-05-08T14:40:45+00:00
“Spiralling Textures”
FYFD 2026-05-08T15:00:00+00:00
This Week on The Analog Antiquarian
The Digital Antiquarian 2026-05-08T16:11:33+00:00
Nova Sonata
Warp Door 2026-05-09T07:09:07+00:00
For us, by us
gilest.org 2026-05-09T07:56:16+00:00
Minifold 01: First Fold (Pingfan Jie)
Warp Door 2026-05-09T10:02:09+00:00
The Borrowed Future, The Infrastructure of Inequality & The Photo-Op Summit
Kareem Abdul-Jabbar 2026-05-09T10:02:56+00:00
Somerville Porchfest 2026
Jeff Kaufman 2026-05-09T13:00:00+00:00
Signal Garden
Weird Fucking Games 2026-05-09T18:49:05+00:00
Store Draft became a Team vs Hordes Draft (WUBRG Drafting)
Mediocre Magic 2026-05-09T19:36:00+00:00
Marges Destimbats (Crumbled Stone Walls) thoughts
The Virtual Moose 2026-05-09T20:50:11+00:00
Sunday photoblogging: Pézenas, maison consulaire
Crooked Timber 2026-05-10T07:47:39+00:00
Dual Bore Janko Venova
Jeff Kaufman 2026-05-10T13:00:00+00:00
Blog Roundup (May 10, 2026)
The Virtual Moose 2026-05-10T14:42:17+00:00
Cow (Demo) (ZDEsy)
Warp Door 2026-05-10T14:45:34+00:00
Meanderware thoughts
The Virtual Moose 2026-05-10T18:57:42+00:00
My Eyes Are Up Here
Discworld MUD Dev Blog 2026-05-10T23:22:54+00:00
Mythos finds a curl vulnerability
daniel.haxx.se 2026-05-11T06:01:35+00:00
From The People’s Bank to the Banker’s Bank
Crooked Timber 2026-05-11T07:11:42+00:00
The Rise and Fall of Mercato Metropolitano
Vittles 2026-05-11T07:27:14+00:00
places, i (droqen, Remi Marchand, Sakib Chowdhury)
Warp Door 2026-05-11T08:21:04+00:00
For a Good Time, Call 347-1111
Midwesterner 2026-05-11T11:12:35+00:00
Archive Dive: When I Say "City," You Say...
The Deleted Scenes 2026-05-11T12:56:05+00:00
Loved the Incredibly Ambitious Interactive Fiction Game PARA//LLAX
The Virtual Moose 2026-05-11T13:22:56+00:00
Liquid Pulleys and Gears
FYFD 2026-05-11T15:00:00+00:00
Death And Taxes
Discworld MUD Dev Blog 2026-05-11T18:40:02+00:00
Fish Bone
Weird Fucking Games 2026-05-11T18:49:05+00:00
This Arcade Game Lets You Invade Iran as Trump
The Present Age 2026-05-11T21:57:26+00:00
INLAND FROM SEAWORLD
Infinite Gossip 2026-05-12T02:41:32+00:00
Haiku Activity & Contract Report, April 2026
Haiku Project 2026-05-12T03:30:00+00:00
Poingle! (Demo) (SlappyHappy2000)
Warp Door 2026-05-12T08:44:31+00:00
The UFO Files, The Hollywood Reset, & Don't Buy the Gold Card Hype
Kareem Abdul-Jabbar 2026-05-12T10:03:30+00:00
Date with a T-Rex <3 (rabbytt)
Warp Door 2026-05-12T11:40:44+00:00
Red Hot And Green
The Deleted Scenes 2026-05-12T12:56:13+00:00
The text is not the product
Crooked Timber 2026-05-12T13:19:45+00:00
Shame them, shun them, ban them, beat them!
Experimental History 2026-05-12T13:23:30+00:00
Made Another Flickgame
The Virtual Moose 2026-05-12T13:26:01+00:00
Jets From Impact
FYFD 2026-05-12T15:00:00+00:00
Where Are All The Data Centers?
Ed Zitron's Where's Your Ed At 2026-05-12T16:17:30+00:00
Stop Using Experimental Art As A Cudgel
The Virtual Moose 2026-05-12T23:00:17+00:00
Showstopper, Centre Piece
Vittles 2026-05-13T07:45:07+00:00
Thanassis Stavrakis, A man carrying a sheep on a motorcycle during a wildfire in Patras, western…
STML 2026-05-13T10:57:56+00:00
Montgomery Inn Forever
Midwesterner 2026-05-13T11:03:24+00:00
Iron Maiden T-Shirt With Ice Cream
Fujichia 2026-05-13T11:40:36+00:00
05/13/2026
Dwarf Fortress Development Log 2026-05-13T12:00:00+00:00
2026-05-13: DF 53.13 Released
Dwarf Fortress Development Log 2026-05-13T12:00:00+00:00
Is "Good Friction" A Bad Idea?
The Deleted Scenes 2026-05-13T12:55:50+00:00
How the “Impossible Torpedo” Worked
FYFD 2026-05-13T15:00:00+00:00
The Coal Room
Weird Fucking Games 2026-05-13T18:49:05+00:00
Nobody Asked for This Washington Post Podcast
The Present Age 2026-05-13T20:33:58+00:00
“Doesn’t 23 seem awfully old for a girl to be. Last night my 23 candles looked like immeasurable…
Ye Olde News 2026-05-13T21:12:54+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T01:05:14+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T01:06:24+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T01:08:07+00:00
Straight men w nice butts are like sick shame and twisted sin wrapped up and i need to hit!
HORSEPUSSY GALORE 2026-05-14T01:13:53+00:00
You’ve inspired me to bare my soul…and my pussy!
HORSEPUSSY GALORE 2026-05-14T01:19:01+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T01:21:32+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T01:52:50+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T02:02:31+00:00
Nail 'Em (Harold Krell)
Warp Door 2026-05-14T02:16:29+00:00
Sort Sol f. Lydia Lunch - Boy/Girl
HORSEPUSSY GALORE 2026-05-14T02:17:39+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T02:18:20+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T02:24:57+00:00
[Outliers] Chung Ju-yung: The Hyundai Founder Who Put a Country on His Back
Farnam Street 2026-05-14T09:50:00+00:00
Weather (Nass Reda-Fathmi)
Warp Door 2026-05-14T10:29:08+00:00
In 1997, local television in Kharkiv accidentally filmed one of the most iconic rave moments in…
HORSEPUSSY GALORE 2026-05-14T11:20:25+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T11:59:52+00:00
Eraserhead baby makes waffles for you!
HORSEPUSSY GALORE 2026-05-14T12:07:34+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T12:24:01+00:00
untitled
HORSEPUSSY GALORE 2026-05-14T12:27:02+00:00
"Two Wheels Good" Semi-Review
The Deleted Scenes 2026-05-14T12:55:16+00:00
The Public Square Is Not Online
City of Yes 2026-05-14T13:02:55+00:00
Why they stopped building wooden stupas
Res Obscura 2026-05-14T13:23:31+00:00
> 206: But why the last? I ask.
Laura Olin 2026-05-14T14:09:26+00:00
Seeing Stress in an Avalanche
FYFD 2026-05-14T15:00:00+00:00
Is this a sex thing? It feels like a sex thing.
Muppe 2026-05-14T16:28:39+00:00
Two-Player Sealed vs Minotaur Horde and Xenagos Revel (WUBRG Sealed)
Mediocre Magic 2026-05-14T16:51:34+00:00
Keep your shorthand to yourself
Muppe 2026-05-14T16:51:51+00:00
Creation and Invention Are Games We All Play
The Bottom Feeder 2026-05-14T17:37:10+00:00
That firefighting game I played in Toronto
Zarf Updates
Those ZIL grammar flags
Zarf Updates
Spring games of the id
Zarf Updates
Visible Zork 3 is now available to all
Zarf Updates
2026 Hugo Award finalists
Zarf Updates
A bunch of games with nothing in common
Zarf Updates
A Cornerstone interpreter and the mu machine
Zarf Updates
The Curse of the Forgotten Adverbs
Zarf Updates
Ludic Narrans
Zarf Updates
GDC: gloom and haruspicy
Zarf Updates
Visible Zorker: March status report
Zarf Updates
Twine and Zork at GDC
Zarf Updates
The Game Narrative Kaleidoscope
Zarf Updates
1989 in context
Zarf Updates
Visible Zorker: status report
Zarf Updates
GDC plans, 2026
Zarf Updates
When is a bug not a bug?
Zarf Updates
To fight a troll
Zarf Updates
The Beacon is lit
Zarf Updates
Chronological order
Zarf Updates
The Visible Zorker Project (and Patreon)
Zarf Updates
2026 IGF nominees
Zarf Updates
The Visible Zorker 2
Zarf Updates
NarraScope is open for submissions
Zarf Updates
Adorable little games that you should just go play
Zarf Updates
Moving away from Tailwind, and learning to structure my CSS
Julia Evans
Links to CSS colour palettes
Julia Evans
Testing Vue components in the browser
Julia Evans
Examples for the tcpdump and dig man pages
Julia Evans
Notes on clarifying man pages
Julia Evans
Some notes on starting to use Django
Julia Evans
A data model for Git (and other docs updates)
Julia Evans
Notes on switching to Helix from vim
Julia Evans
New zine: The Secret Rules of the Terminal
Julia Evans
Using `make` to compile C programs (for non-C-programmers)
Julia Evans
Standards for ANSI escape codes
Julia Evans
How to add a directory to your PATH
Julia Evans
Some terminal frustrations
Julia Evans
What's involved in getting a "modern" terminal setup?
Julia Evans
"Rules" that terminal programs follow
Julia Evans
Why pipes sometimes get "stuck": buffering
Julia Evans
Importing a frontend Javascript library without a build system
Julia Evans
New microblog with TILs
Julia Evans
ASCII control characters in my terminal
Julia Evans
Using less memory to look up IP addresses in Mess With DNS
Julia Evans
The agent principal-agent problem
David Crawshaw
I am building a cloud
David Crawshaw
Eight more months of agents
David Crawshaw
How I program with Agents
David Crawshaw
How I program with LLMs
David Crawshaw
jsonfile: a quick hack for tinkering
David Crawshaw
new year, same plan
David Crawshaw
log4j: between a rock and a hard place
David Crawshaw
Software I’m thankful for
David Crawshaw
Remembering the LAN
David Crawshaw
The asymmetry of Internet identity
David Crawshaw
Zero Trust Networks
David Crawshaw
Go 1.13: xerrors
David Crawshaw
Fast compilers for fast programs
David Crawshaw
UTF-7: a ghost from the time before UTF-8
David Crawshaw
One process programming notes (with Go and SQLite)
David Crawshaw
Reasoning with Regret
David Crawshaw
Searching the Creative Internet
David Crawshaw
Service Throughput Tradeoffs
David Crawshaw
Sharp-Edged Finalizers in Go
David Crawshaw
The Tragedy of Finalizers
David Crawshaw
Go and SQLite: when database/sql chafes
David Crawshaw
Experimentation Adrift
David Crawshaw
Leaving Google
David Crawshaw
Less cgo overhead in Go 1.8
David Crawshaw
BBR
David Crawshaw
Compiler Bomb
David Crawshaw
On recieving the News
David Crawshaw
Buried by the media
David Crawshaw
Smaller Go 1.7 binaries
David Crawshaw
Good business
David Crawshaw
Everyone a writer
David Crawshaw
2016-06-29
David Crawshaw
Transaction oriented collector
David Crawshaw
Machining under a microscope
David Crawshaw
Limits of Superintelligence
David Crawshaw
COPY Relocations
David Crawshaw
Atom Feed
David Crawshaw
2016-02-10
David Crawshaw
2016-01-23
David Crawshaw
2016-01-18
David Crawshaw
2016-01-15
David Crawshaw
2016-01-09
David Crawshaw
2016-01-07
David Crawshaw
2016-01-05
David Crawshaw
2016-01-04
David Crawshaw
2016-01-03
David Crawshaw
2016-01-02
David Crawshaw
2016-01-01
David Crawshaw
2015-12-29
David Crawshaw
Under the heel of the spirit
David Crawshaw
2015-12-27
David Crawshaw
2015-12-26
David Crawshaw
2015-12-20
David Crawshaw
2015-12-15
David Crawshaw
2015-12-04
David Crawshaw
2015-11-18
David Crawshaw
2015-11-16
David Crawshaw
2015-10-13
David Crawshaw
2015-08-07
David Crawshaw
2015-08-04
David Crawshaw
2015-07-27
David Crawshaw
2015-07-17
David Crawshaw
2015-07-15
David Crawshaw
2015-07-14
David Crawshaw
2015-07-07
David Crawshaw
2015-06-26
David Crawshaw
2015-06-24
David Crawshaw
2015-06-22
David Crawshaw
2015-06-01
David Crawshaw
2015-05-08
David Crawshaw
2015-05-07
David Crawshaw
2015-04-02
David Crawshaw
2015-03-10
David Crawshaw
2015-03-09
David Crawshaw
2015-03-01
David Crawshaw
2015-01-11
David Crawshaw
2015-01-10
David Crawshaw
2014-12-11
David Crawshaw
2014-07-28
David Crawshaw
2014-06-13
David Crawshaw
2014-05-14
David Crawshaw
2014-05-06
David Crawshaw
2014-04-18
David Crawshaw
2014-03-08
David Crawshaw
2014-01-17
David Crawshaw
SyncMaster of the Universe
Leaded Solder
Loonies for Loongsons
Leaded Solder
Make Your Own ColecoVision At Home (Part 5 - Making More)
Leaded Solder
Untrashing a TRS-80
Leaded Solder
Leaded Solder vs. The Crazy 77
Leaded Solder
Controlling the Wizzard
Leaded Solder
Giving the SPARCstation some jumper cables
Leaded Solder
Commodore 64 black screen failure round-up
Leaded Solder
You’re Out of Timer
Leaded Solder
Three Times the Fun
Leaded Solder
Simple gpx export from ridewithgps
Dima Kogan
mrcal 2.5 released!
Dima Kogan
Meshroom packaged for Debian
Dima Kogan
Using libpython3 without linking it in; and old Python, g++ compatibility patches
Dima Kogan
Eigen macro specializations crashes
Dima Kogan
Getting precise timings out of RS-232 output
Dima Kogan
Shop scheduling with PuLP
Dima Kogan
When are the days getting longer the fastest?
Dima Kogan
Strava track filtering validation
Dima Kogan
GNU Make: details regarding intermediate files
Dima Kogan
Speeding up JavaScript function with AI help
Krzysztof Kowalczyk blog
How to run msvc cl.exe from command-line (powershell)
Krzysztof Kowalczyk blog
Novel login system for web apps
Krzysztof Kowalczyk blog
Benchmarking JSON vs TOON in Go
Krzysztof Kowalczyk blog
From JSON to TOON
Krzysztof Kowalczyk blog
Fixing Zed's debugger keybindings
Krzysztof Kowalczyk blog
Ideas for faster web dev cycle
Krzysztof Kowalczyk blog
Zed debug setup for go server / Svelte web app
Krzysztof Kowalczyk blog
Stage manager in Mac OS
Krzysztof Kowalczyk blog
AltTab for Mac OS
Krzysztof Kowalczyk blog
lazy import of JavaScript modules
Krzysztof Kowalczyk blog
Using await in Svelte 5 components
Krzysztof Kowalczyk blog
vite /rollup manualChunks
Krzysztof Kowalczyk blog
Increase software sales by 50% or more
Krzysztof Kowalczyk blog
File sync is very slow
Krzysztof Kowalczyk blog
New Edna feature: multiple notes
Krzysztof Kowalczyk blog
Evolving Edna Ask AI UI
Krzysztof Kowalczyk blog
Desktop UI frameworks written by a single person
Krzysztof Kowalczyk blog
Implementing UI translation in SumatraPDF, a C++ Windows application
Krzysztof Kowalczyk blog
Calling Grok, OpenAI, Anthropic, Google, OpenRouter API from the browser
Krzysztof Kowalczyk blog
Case study of over-engineered C++ code
Krzysztof Kowalczyk blog
Increase open file limit on Ubuntu Linux
Krzysztof Kowalczyk blog
Explaining nil interface{} gotcha in Go
Krzysztof Kowalczyk blog
Size textarea to content
Krzysztof Kowalczyk blog
All about Svelte 5 snippets
Krzysztof Kowalczyk blog
Don't Use aidev-mode
Language Agnostic
Arbitrary Update 0leinzfmdpg
Language Agnostic
3D Printing Field Report
Language Agnostic
AI Multipliers
Language Agnostic
Light and Spin
Language Agnostic
Arbitrary Update 9999
Language Agnostic
Not Dead Yet
Language Agnostic
Models In The Wild
Language Agnostic
The Scratchpad Talk
Language Agnostic
Chop and aidev
Language Agnostic
TASM Notes, January 9th, 2025
Language Agnostic
Making LLMs Do What You Want to your Files
Language Agnostic
Making LLMs Do What You Want - Interlude
Language Agnostic
Making LLMs Do More of What You Want
Language Agnostic
Making LLMs Do What You Want
Language Agnostic
GCP is Bullshit and Here's Why
Language Agnostic
Antler - Elegy
Language Agnostic
TASM Notes, May 23rd, 2024
Language Agnostic
TASM Notes, May 16th, 2024
Language Agnostic
TASM Notes, May 5th 2024
Language Agnostic
esbuild can build css
Julia Evans: TIL
Al Sweigart's Python books are available for free
Julia Evans: TIL
Resources for upgrading Django
Julia Evans: TIL
You don't have to close or <li> tags
Julia Evans: TIL
Advice for writing alt text
Julia Evans: TIL
fx: a jq replacement
Julia Evans: TIL
CSS supports nested selectors now!
Julia Evans: TIL
You can use `fzf` to review git commits
Julia Evans: TIL
strace has a --stack-traces option
Julia Evans: TIL
In CSS you can populate `content:` with a `data-` attribute
Julia Evans: TIL
Environment variables with no equals sign
Julia Evans: TIL
Two ways the mouse wheel works in the terminal
Julia Evans: TIL
You can run `tty` to see your current TTY
Julia Evans: TIL
strace's `--tips`
Julia Evans: TIL
Tiny IP-KVM devices exist
Julia Evans: TIL
Emoji Kitchen
Julia Evans: TIL
pip install --user can override system libraries
Julia Evans: TIL
why the text disappers from my PDF when I print it
Julia Evans: TIL
`**` works for globbing in the shell
Julia Evans: TIL
Some programming languages buffer stdout and some don't
Julia Evans: TIL
Is Zig's New Writer Unsafe?
openmymind.net
Everything is a []u8
openmymind.net
I'm too dumb for Zig's new IO interface
openmymind.net
Zig's new Writer
openmymind.net
Zig's new LinkedList API (it's time to learn @fieldParentPtr)
openmymind.net
Allocator.resize
openmymind.net
ArenaAllocator.free and Nested Arenas
openmymind.net
Zig's dot star syntax (value.*)
openmymind.net
GetOrPut With String Keys
openmymind.net
Comparing Strings as Integers with @bitCast
openmymind.net
Switching on Strings in Zig
openmymind.net
Using Generics to Inject Stubs when Testing
openmymind.net
In Zig, What's a Writer?
openmymind.net
Using SIMD to Tell if the High Bit is Set
openmymind.net
Peeking Behind Zig Interfaces by Creating a Dummy std.Random Implementation
openmymind.net
Comptime as Configuration
openmymind.net
Zig's @bitCast
openmymind.net
Basic Awareness in Addition to Deep Understanding
openmymind.net
Sorting Strings in Zig
openmymind.net
Gluing JSON
openmymind.net
Functional Classes in Clojure
The Clean Code Blog
Functional Classes
The Clean Code Blog
Space War
The Clean Code Blog
Functional Duplications
The Clean Code Blog
Roots
The Clean Code Blog
More On Types
The Clean Code Blog
On Types
The Clean Code Blog
if-else-switch
The Clean Code Blog
Pairing Guidelines
The Clean Code Blog
Solid Relevance
The Clean Code Blog
Loopy
The Clean Code Blog
Conference Conduct
The Clean Code Blog
The Disinvitation
The Clean Code Blog
REPL Driven Design
The Clean Code Blog
A Little More Clojure
The Clean Code Blog
A Little Clojure
The Clean Code Blog
A New Hope
The Clean Code Blog
Open Letter to the Linux Foundation
The Clean Code Blog
What They Thought of Programmers.
The Clean Code Blog
Circulatory
The Clean Code Blog
MUD Day Postponed to 20 June
The CRPG Addict
Upcoming Games: Al-Qadim (1994), The Odyssey (1993), Escape from Ragor (1994), Dungeon Arcade (1987), Pagan: Ultima VIII (1994), Warriors and Warlocks (1983), Ravenloft: Strahd's Possession (1994)
The CRPG Addict
Game 577: Yendorian Tales: Book I
The CRPG Addict
Nobunaga's Ambition: BASIC bushido
Data Driven Gamer
Nobunaga's Ambition: Won!
Data Driven Gamer
Game 470: Nobunaga's Ambition
Data Driven Gamer
Game 469: Battle of Kawanakajima
Data Driven Gamer
Paradroid: Won!
Data Driven Gamer
Game 468: Paradroid
Data Driven Gamer
Games 465-467: Hewson Consultants and the 3D Seiddab trilogy
Data Driven Gamer
Superauthenticity: Arcade game aspect ratios
Data Driven Gamer
Game 464: Gun.Smoke
Data Driven Gamer
Xanadu: Won!
Data Driven Gamer
Xanadu: How to train your dragon slayer
Data Driven Gamer
Xanadu: Tickling the dragon
Data Driven Gamer
Xanadu: Sea of squares
Data Driven Gamer
Xanadu: Anxious powergaming
Data Driven Gamer
Xanadu: Full plate and packing steel
Data Driven Gamer
Xanadu: Honey tongue, butter fingers
Data Driven Gamer
Xanadu: Pick poor Robin clean
Data Driven Gamer
Xanadu: I expect you to buy
Data Driven Gamer
Xanadu: Magic
Data Driven Gamer
Game 463: Xanadu: Dragon Slayer II
Data Driven Gamer
Silent Service: Tang & final rating
Data Driven Gamer
Silent Service: Seawolf
Data Driven Gamer
Game 462: Silent Service
Data Driven Gamer
ST Pawn
Data Driven Gamer
The Pawn: Won!
Data Driven Gamer
Nintendo Promotional Toys: Kanebo's Dash Rider (ダッシュライダー) from the 1970s
beforemario
Nintendo produced 1960s promotional card set
beforemario
Nintendo Home Race (ホームレース, ca 1966)
beforemario
Nintendo Playing Cards catalogue from 2001
beforemario
A Nintendo Pilgrimage [part 7 of 7]: My Unforgettable Week in Kyoto
beforemario
A Nintendo Pilgrimage [part 6 of 7]: My Unforgettable Week in Kyoto
beforemario
A Nintendo Pilgrimage [part 5 of 7]: My Unforgettable Week in Kyoto
beforemario
A Nintendo Pilgrimage [part 4 of 7]: My Unforgettable Week in Kyoto
beforemario
A Nintendo Pilgrimage [part 3 of 7]: My Unforgettable Week in Kyoto
beforemario
A Nintendo Pilgrimage [part 2 of 7]: My Unforgettable Week in Kyoto
beforemario
Nintendo playing cards featuring Marilyn Monroe
beforemario
A Nintendo Pilgrimage [part 1 of 7]: My Unforgettable Week in Kyoto
beforemario
From Cards to Condiments: Nintendo’s Ads in a Disney Booklet from the 1960s
beforemario
Ten years of Before Mario book memories
beforemario
Meet the Collectors - #13 - Elijah Luttmann
beforemario
Nintendo ad in 1960s Playboy magazine
beforemario
The Nintendo Museum: my first impressions
beforemario
A Treasure in Kyoto: Rediscovering Nintendo’s First Ad from 1894
beforemario
Nintendo Museum's 2024 Ultra Hand Remake: honors and improves the original
beforemario
Nintendo Poitan Game, a water toy lost in time (ポイタンゲーム, 1966)
beforemario
Nintendo toys in 1977 Kiddy Land catalogue
beforemario
The Project Odyssée team visits Before Mario
beforemario
Nintendo Patriotic Cards from 1942 and 1943 (Aikoku Hyakunin Isshu / 愛國百人一首)
beforemario
Spot the difference: Ultra(s)cope box variants
beforemario
Nintendo Love Peace "Smiley" e-clock (Love Peace 電気時計, circa 1971)
beforemario
Announcing RimWorld World
Ludeon Studios
The winter merch collection is here! ❄️
Ludeon Studios
Holiday trade caravans are on their way!
Ludeon Studios
Update 1.6.4630 released
Ludeon Studios
Bring home a thrumbo and boomalope!
Ludeon Studios
Announcing the thrumbo figure and boomalope night light!
Ludeon Studios
Update 1.6.4566 improves gravships, shuttles, and more
Ludeon Studios
Update 1.6.4543 released
Ludeon Studios
Update 1.6.4535 released
Ludeon Studios
Update 1.6.4528 released
Ludeon Studios
Problems with video recreations of classic pinball
@Play Collected
How to Get Started Playing Mystery Dungeon
@Play Collected
@Play 87: Interview with Josh Ge, Creator of Cogmind
@Play Collected
@Play 86: Interview with Dr. Thomas Biskup, Creator of ADOM
@Play Collected
Slashware's game Ananias releases on Steam
@Play Collected
Zelda Randomizer set to stream at 2 PM Eastern
@Play Collected
Stuff concerning @Play, Zelda Randomizer and other things
@Play Collected
Roguelike Celebration, Notes on My Talk
@Play Collected
Something called the Casino Dungeon
@Play Collected
Progress on 86
@Play Collected
@Play 85: A Talk with Digital Eel, Makers of the Infinite Space Games
@Play Collected
7DRL Home Stretch!
@Play Collected
@Play 84: The Rescue of Meta-Zelda
@Play Collected
Update: next column, StoryBundle results
@Play Collected
@Play 83: HyperRogue
@Play Collected
The book is out! "@Play: Exploring Roguelike Games"
@Play Collected
Nethack 3.6 is out!
@Play Collected
Not done yet
@Play Collected
EXTRA: Satoru Iwata knew what roguelikes are
@Play Collected
@Play 82: The Talks of the International Roguelike Developers Conference US, 2015
@Play Collected
EXTRA: Junethack
@Play Collected
Upcoming: @Play 82 on IRDC US 2015
@Play Collected
International Roguelike Developers Conference, Atlanta GA
@Play Collected
EXTRA: Roguelike Radio celebrates 100 episodes!
@Play Collected
EXTRA: Bay12 Games (of Dwarf Fortress) has a Patreon
@Play Collected
Graphics Studies Compilation
Adrian Courrèges
UE4 Optimized Post-Effects
Adrian Courrèges
Metal Gear Solid V - Graphics Study
Adrian Courrèges
Beware of Transparent Pixels
Adrian Courrèges
DOOM (2016) - Graphics Study
Adrian Courrèges
GTA V - Graphics Study - Part 3
Adrian Courrèges
GTA V - Graphics Study - Part 2
Adrian Courrèges
GTA V - Graphics Study
Adrian Courrèges
Print Copy of SupCom Graphics Study
Adrian Courrèges
Exp3D Goes Open-Source
Adrian Courrèges
Supreme Commander - Graphics Study
Adrian Courrèges
Introducing Linux Visual Novel Reader
Adrian Courrèges
Deus Ex: Human Revolution - Graphics Study
Adrian Courrèges
Customizing IRKit Firmware: LED and Offline Mode
Adrian Courrèges
Introducing IRKit Web Remote
Adrian Courrèges
IRKit Setup Guide for Android, iOS, Linux, Mac, Windows
Adrian Courrèges
Beam Waves Live Wallpaper for Android
Adrian Courrèges
Website Makeover
Adrian Courrèges
Exp3D for Android and Web-Browser
Adrian Courrèges
5.1 sound with nForce chipset under Feisty
Adrian Courrèges
Ludum Dare 26
Big Bad Wofl
Wherefore art I?
Big Bad Wofl
What have I been doing?
Big Bad Wofl
Really need a name for this thing...
Big Bad Wofl
Announcing [Insert Name Here]
Big Bad Wofl
The Final Secret
Big Bad Wofl
It's been a good run
Big Bad Wofl
More Secrets
Big Bad Wofl
Secret Project
Big Bad Wofl
Morf Feedback
Big Bad Wofl
Random World Generator
Big Bad Wofl
Morf is back!
Big Bad Wofl
Random River Generation
Big Bad Wofl
Play Morf Now!
Big Bad Wofl
Morf, JavaScript and Laziness
Big Bad Wofl
Well, would you look at that
Big Bad Wofl
Morf: Alpha Version
Big Bad Wofl
Morf
Big Bad Wofl
The Official Website is up!
Big Bad Wofl
Terrain Coloring and Trees
Big Bad Wofl
Display Lists and Combination Shaders
Big Bad Wofl
Shaders
Big Bad Wofl
Ludum Dare 25: Post Mortem
Big Bad Wofl
Ludum Dare: 13 hours
Big Bad Wofl
Ludum Dare: 11 hours
Big Bad Wofl
The Bottom Feeder Has Moved On!
The Bottom Feeder
Getting Sweet Patron Money On the Modern Internet
The Bottom Feeder
Queen's Wish Is Out. Here's Why It's So Weird!
The Bottom Feeder
I Am the Cheapest Bastard In Indie Games
The Bottom Feeder
Why All Of Our Games Look Like Crap
The Bottom Feeder
Make Them Want. Delay. Fulfill. Repeat.
The Bottom Feeder
The Glorious, Profitable, Inescapable Art of Addiction
The Bottom Feeder
We Did Our First Kickstarter! And It Worked!
The Bottom Feeder
Divinity: Original Sin 2 and the Rewards of Doing One Hard Thing Right
The Bottom Feeder
I Gave a Big Talk On Indie Games and It's Pretty Good.
The Bottom Feeder
We Released Avernum 3: Ruined World.
The Bottom Feeder
Cuphead, Cruelty, and Selling Unfairness to You.
The Bottom Feeder
I Settle All Video Game Arguments, Part 2: What Is a Game?
The Bottom Feeder
I Settle All Video Game Arguments, Part 1: Game Reviews
The Bottom Feeder
Avernum 3, Remasters, and the Joy of Owning Your Work.
The Bottom Feeder
The Life and Merciful Death of the Fad Controller
The Bottom Feeder
Persona 5, Cartoon Cats, Depthless Evil, and Dating Your Teacher.
The Bottom Feeder
Games Have Too Many Words: A Case Study.
The Bottom Feeder
Does Your Video Game Have Too Many Words? (Yeah, Probably.)
The Bottom Feeder
Writing Indie Games Is Like Being a Musician. In the Bad Way.
The Bottom Feeder
We Are No Longer Supporting Android. Sigh.
The Bottom Feeder
A Very Long Post About How to Become a Creator.
The Bottom Feeder
We Released Avadon 3! (Also, a Few Words About Free Time)
The Bottom Feeder
No, Video Games Aren't Art. We're BETTER.
The Bottom Feeder
To Be a Pro is to Be Abused.
The Bottom Feeder
Quaker principles line up quite well with modern...
kottke.org
The Revolt Against the Girl Bosses…...
kottke.org
The Night Witches: The Female Nazi Hunters of WWII
kottke.org
25 Books That Capture This American Moment . They...
kottke.org
Software Developers Say AI Is Rotting Their Brains ....
kottke.org
On “rich guy has an opinion” journalism ,...
kottke.org
World History Timeline
kottke.org
New York’s Neue Galerie Will Merge With the Metropolitan...
kottke.org
The Neanderthal dentist: archaeologists found evidence...
kottke.org
Sarah Rose (who is blind): “Meta glasses are...
kottke.org
How Russell Vought Became the Shadow President
kottke.org
Omg, Amazon Prime inserted an ad for Febreze in the...
kottke.org
A very good, very 2026 headline: Japan Runs Out of Robot...
kottke.org
When Your Participation Is Decoration
kottke.org
“I believe in myself. That’s why I...
kottke.org
Shape of Dreams (Zendaya × Spike Jonze)
kottke.org
The Guardian asked authors, critics, and academics to...
kottke.org
The 2026 National Recording Registry inductees were...
kottke.org
This Tiny Celestial Body Past Pluto Shouldn’t Have an...
kottke.org
A Moment That Changed Me: I Saw My First Total Solar...
kottke.org
What Childhood Folklore Did You Learn As a Kid?
kottke.org
I Want to Live Like Costco People . “Embracing the...
kottke.org
Robin Sloan writes about the personalized, AI-written,...
kottke.org
What Would J.R.R. Tolkien Think of Palantir?
kottke.org
Meet the Sad Wives of AI . “Princess Diana...
kottke.org
Just dropped: Foo Fighters’ Tiny Desk Concert ....
kottke.org
Stop-Motion Lego Dr. Strangelove
kottke.org
Adam Serwer : “Violence serves an authoritarian...
kottke.org
A map of the regions of the US , as voted on by Reddit...
kottke.org
“So, at about 14, I became the team’s...
kottke.org
The World Press Freedom Index at Global 25-Year Low
kottke.org
Sounds of the 60s: the IBM 1401 (punchcard collation,...
kottke.org
Jamelle Bouie thinks Alexandria Ocasio-Cortez is...
kottke.org
Being Fed Content
kottke.org
Study: “A few weeks of X’s algorithm can...
kottke.org
Can You See the World When You Close Your Eyes?
kottke.org
We’re Diversifying the University by Hiring More...
kottke.org
Digg has (sorta) relaunched (again) and instead of an...
kottke.org
How NASA Built Artemis II’s Fault-Tolerant...
kottke.org
Remember Desktop Tower Defense ? I played it for a bit...
kottke.org
Taken : this is a web page that shows how much data your...
kottke.org
Mesmerizing 4K Video of a Cat-5 Super Typhoon
kottke.org
Interesting thread about why rural towns don’t...
kottke.org
Wallace & Gromit 24/7 Livestream
kottke.org
People Who Don’t Like People Are Making All of Our...
kottke.org
Grandma Stand
kottke.org
Where are the public benches on the internet?...
kottke.org
Now open in NYC: a pop-up called The Donald J. Trump and...
kottke.org
The Hidden Cassettes . “This is going to sound...
kottke.org
In 1951, the Civil Rights Congress submitted a petition...
kottke.org
Someone in a private forum I belong to mentioned fountain...
kottke.org
Wowsabout!
kottke.org
An analysis of 18 years of Guardian blind dates ....
kottke.org
Pioneering abstract artist Hilma af Klint’s...
kottke.org
The Design Evolution of Screwdriver Handles
kottke.org
What Can We Do About Partisan Gerrymandering? Jamelle...
kottke.org
Nolen Royalty : “My latest project is Marc...
kottke.org
The 2025 Alaskan Tsunami That Measured 1578 Feet Tall
kottke.org
Prophecy At 1420 MHz is the first single from Boards of...
kottke.org
It’s David Attenborough’s 100th birthday...
kottke.org
Dragoncatcher: Laying it on thick
Robin Sloan
Dragoncatcher: News travels too fast these days
Robin Sloan
Dragoncatcher: Referer reality
Robin Sloan
Dragoncatcher: Claude Managed Agents feature request
Robin Sloan
Dragoncatcher: Tone control, part 2
Robin Sloan
Dragoncatcher: Talkie and Claude (no, the other one)
Robin Sloan
Shopkeeper Rampant
Robin Sloan
Dragoncatcher: The milestone of Gemma 4
Robin Sloan
Dragoncatcher: Tinfoil
Robin Sloan
Dragoncatcher: Reasoning models don't so much think as navigate
Robin Sloan
Dragoncatcher: The Galactica option
Robin Sloan
Dragoncatcher: Sweat the details
Robin Sloan
Dragoncatcher: The bat of fate
Robin Sloan
Winter Garden: Where is it like to be a language model?
Robin Sloan
Dragoncatcher: Vector voxels
Robin Sloan
Dragoncatcher: Cosleuth
Robin Sloan
Dragoncatcher: Elemental content
Robin Sloan
Good trains
Robin Sloan
Dragoncatcher: Wrangler init woes
Robin Sloan
Dragoncatcher: Maybe the G in AGI stands for Gemini
Robin Sloan
Does meditation experience improve success with the jhanas?
Nadia Asparouhova
How to do the jhanas
Nadia Asparouhova
Working notes for Summer of Protocols
Nadia Asparouhova
Explaining tech’s notion of talent scarcity
Nadia Asparouhova
Mapping digital worlds
Nadia Asparouhova
Early stage funding markets for science - an analysis
Nadia Asparouhova
Mapping out the tribes of climate
Nadia Asparouhova
Cultivating agency
Nadia Asparouhova
Idea machines
Nadia Asparouhova
Understanding science funding in tech, 2011-2021
Nadia Asparouhova
Passkey transfer
Some Bits: Nelson's Linkblog
Reddit Russian propaganda
Some Bits: Nelson's Linkblog
NVME erasing
Some Bits: Nelson's Linkblog
2fa 1337
Some Bits: Nelson's Linkblog
USB Cheat Sheet
Some Bits: Nelson's Linkblog
xdg-ninja
Some Bits: Nelson's Linkblog
Kyle Kingsbury Podcast Podcast
Some Bits: Nelson's Linkblog
Is GitHub Cooked?
Some Bits: Nelson's Linkblog
Containers vs VMs
Some Bits: Nelson's Linkblog
India health survey (PDF)
Some Bits: Nelson's Linkblog
Medicat USB
Some Bits: Nelson's Linkblog
RNGdle
Some Bits: Nelson's Linkblog
Grass Valley welcome arch
Some Bits: Nelson's Linkblog
Oil Refineries
Some Bits: Nelson's Linkblog
AI goblins
Some Bits: Nelson's Linkblog
Sniffies $100M
Some Bits: Nelson's Linkblog
1Password + Flatpak browser
Some Bits: Nelson's Linkblog
rpm-ostree
Some Bits: Nelson's Linkblog
GitHub update
Some Bits: Nelson's Linkblog
NSF board fired
Some Bits: Nelson's Linkblog
Linux VRAM management
Some Bits: Nelson's Linkblog
Trump vs DAF
Some Bits: Nelson's Linkblog
Gay Kaiser scandal
Some Bits: Nelson's Linkblog
Duncan Grant / Bathing
Some Bits: Nelson's Linkblog
Claude postmortem
Some Bits: Nelson's Linkblog
Building a cloud
Some Bits: Nelson's Linkblog
Series A for exe.dev
Some Bits: Nelson's Linkblog
wsl9x
Some Bits: Nelson's Linkblog
1966 Sip-In
Some Bits: Nelson's Linkblog
pi.dev
Some Bits: Nelson's Linkblog
The Conversation (free)
Some Bits: Nelson's Linkblog
Biennale corruption
Some Bits: Nelson's Linkblog
South (1959 TV)
Some Bits: Nelson's Linkblog
Jukka and Tane (NSFW)
Some Bits: Nelson's Linkblog
AIs in Math
Some Bits: Nelson's Linkblog
smol machines
Some Bits: Nelson's Linkblog
Unsloth Qwen3.6
Some Bits: Nelson's Linkblog
Gyro monorail
Some Bits: Nelson's Linkblog
AI risks to the Internet
Some Bits: Nelson's Linkblog
The best response is to stop
Some Bits: Nelson's Linkblog
CRPG Romance
Some Bits: Nelson's Linkblog
63 Chinese Cuisines
Some Bits: Nelson's Linkblog
anemoia
Some Bits: Nelson's Linkblog
plastic, prism, void
Some Bits: Nelson's Linkblog
Light a Candle for Claude
Some Bits: Nelson's Linkblog
Crunchy Chili Crisps
Some Bits: Nelson's Linkblog
Charcuterie
Some Bits: Nelson's Linkblog
ugetty
Some Bits: Nelson's Linkblog
Chinese Cooking Demystified
Some Bits: Nelson's Linkblog
I+G for savory flavor
Some Bits: Nelson's Linkblog
The Future of Everything is Lies, I Guess
Some Bits: Nelson's Linkblog
Switching to OpenStreetMap
Some Bits: Nelson's Linkblog
Hunky Jesus 2026 (NSFWish)
Some Bits: Nelson's Linkblog
Farrow on Altman
Some Bits: Nelson's Linkblog
guppylm
Some Bits: Nelson's Linkblog
Adobe fuckery
Some Bits: Nelson's Linkblog
Jujutsu Tutorial
Some Bits: Nelson's Linkblog
RIP Stuey Weills
Some Bits: Nelson's Linkblog
AI security reviews
Some Bits: Nelson's Linkblog
Learn and Test DMARC
Some Bits: Nelson's Linkblog
GrindrPlus
Some Bits: Nelson's Linkblog
Moonfrost
Some Bits: Nelson's Linkblog
White House terrible app
Some Bits: Nelson's Linkblog
Homocore anthology
Some Bits: Nelson's Linkblog
Understanding passkeys
Some Bits: Nelson's Linkblog
Sahlins biography
Some Bits: Nelson's Linkblog
Squares in Squares
Some Bits: Nelson's Linkblog
about exe.dev
Some Bits: Nelson's Linkblog
curl > /dev/sda
Some Bits: Nelson's Linkblog
pr-review
Some Bits: Nelson's Linkblog
Left-right split in Paris
Some Bits: Nelson's Linkblog
DMARC statistics
Some Bits: Nelson's Linkblog
Ubuntu connectivity check down
Some Bits: Nelson's Linkblog
Bluesky $100M VC funding
Some Bits: Nelson's Linkblog
BYD Flash Charging
Some Bits: Nelson's Linkblog
Tesla AI crash
Some Bits: Nelson's Linkblog
In search of Banksy
Some Bits: Nelson's Linkblog
"Mark Lawrence" AI slop
Some Bits: Nelson's Linkblog
Google DNS cache flush
Some Bits: Nelson's Linkblog
MALUS license washing
Some Bits: Nelson's Linkblog
ZyncPDF
Some Bits: Nelson's Linkblog
Landscape khipu
Some Bits: Nelson's Linkblog
Perma.cc
Some Bits: Nelson's Linkblog
Modular robots
Some Bits: Nelson's Linkblog
AI ethics and market
Some Bits: Nelson's Linkblog
Parseword
Some Bits: Nelson's Linkblog
Attensity!
Some Bits: Nelson's Linkblog
Trump pardon industry
Some Bits: Nelson's Linkblog
Superpowers
Some Bits: Nelson's Linkblog
GitHub status
Some Bits: Nelson's Linkblog
Japanese Glory Hole
Some Bits: Nelson's Linkblog
Vietnamese Cajun
Some Bits: Nelson's Linkblog
Agentic Engineering Patterns
Some Bits: Nelson's Linkblog
Joy in resistance
Some Bits: Nelson's Linkblog
Musk PAC voter fraud
Some Bits: Nelson's Linkblog
Andor and US fascism
Some Bits: Nelson's Linkblog
IRCv3
Some Bits: Nelson's Linkblog
OpenFactBook
Some Bits: Nelson's Linkblog
"Remigration"
Some Bits: Nelson's Linkblog
Distillation attacks
Some Bits: Nelson's Linkblog
Black & White Snacks
One Foot Tsunami
💧 Quick and Creepy
One Foot Tsunami
The Septuagenarian Resident
One Foot Tsunami
At Least AC/DC Would Be Proud
One Foot Tsunami
💧 Happy Mother’s Lengths of Time
One Foot Tsunami
One Hell of a Pop Quiz
One Foot Tsunami
Popes — They’re Just Like Us!
One Foot Tsunami
Divorce Registries
One Foot Tsunami
Foiling Online Age Checks
One Foot Tsunami
Oracle Park’s Bogus 9-9-9 Challenge Has Disappeared
One Foot Tsunami
A Very Poor Trade
One Foot Tsunami
Renea Gamble Prevails
One Foot Tsunami
Pam From Wenatchee Made a Hologram
One Foot Tsunami
The Bartered Vasectomy
One Foot Tsunami
💧 Sub-Two Sabastian Sawe
One Foot Tsunami
A Very Fusilli Plan
One Foot Tsunami
Dropping in Unannounced
One Foot Tsunami
💧 Sectional 42
One Foot Tsunami
We’ve Got to Hang Our Hats on Something
One Foot Tsunami
Snagged on a Giant C
One Foot Tsunami
Vehicles Crushed by Snow
One Foot Tsunami
💧 The Magawa Monument Is Made of Stone
One Foot Tsunami
Pivoting From Shoes to Artificial Intelligence
One Foot Tsunami
💧 Stay in Your Lane, Paperless Post
One Foot Tsunami
A Massive Magawa
One Foot Tsunami
Where Is Everybody
Today in Tabs
It's Lamer Than You Think
Today in Tabs
Dopefish
Today in Tabs
Goblin Problem
Today in Tabs
Ballroom Twits
Today in Tabs
Papal Bull
Today in Tabs
5.6 Million Bees
Today in Tabs
Everything Is Hacked Now
Today in Tabs
The Future of Football Starts Today
Today in Tabs
Who Goes AI?
Today in Tabs
Aham for the Mild-Built
Today in Tabs
Dick Hebdige Explained Dinergoth in 1979
Today in Tabs
Sokath, His Eyes Uncovered
Today in Tabs
Purity Supreme
Today in Tabs
They Don't Care
Today in Tabs
The Lowbrow Harper's
Today in Tabs
A.I. Isn't People
Today in Tabs
The Assassination of The Washington Post by the Coward Jeff Bezos
Today in Tabs
Masks Off
Today in Tabs
Welcome to the Resistance, Driving Range Guys
Today in Tabs
Your CrossFit App Doesn’t Know What You Did
Perfection Kills
CrossFit training in the age of AI
Perfection Kills
Overnight success
Perfection Kills
What’s my XENOM score?
Perfection Kills
Reflections on training, 2025 → ‘26
Perfection Kills
CrossFit tracking app but… you’re in control?
Perfection Kills
My Fitness: from spreadsheet to an app
Perfection Kills
PRzilla: CrossFit AI companion
Perfection Kills
The science of Vipassana
Perfection Kills
Vipassana through the modern lens
Perfection Kills
How can coffee taste peachy?
April Cools' Club
🙏: please or thank you?
April Cools' Club
My year of reading Chinese history
April Cools' Club
Chants of Sennaar: A Review
April Cools' Club
On Trees. But Not Those Trees.
April Cools' Club
Myst's Minecart Maze Is Great Actually
April Cools' Club
Don't Call It a Comedown
April Cools' Club
Chicago vs New York Pizza is the Wrong Argument
April Cools' Club
The Self-Cancelling Subscription
April Cools' Club
The underrated benefits of always having oatmeal at lunch
April Cools' Club
Puzzlehunts
April Cools' Club
My Experience As A Rice Farmer
April Cools' Club
Come ho programmato un videogioco per Game Boy sull’amicizia con GB Studio
April Cools' Club
Product review: Kvikk Lunsj
April Cools' Club
I guess I cook now
April Cools' Club
How to decorate a child's birthday cake
April Cools' Club
Digitisation is process optimisation
April Cools' Club
I listened to the 1001 (?) albums I should listen to before I die
April Cools' Club
3D-printing a Trombone
April Cools' Club
A non-exhaustive list of stuff I recommend
April Cools' Club
The Irrational Decision—A Book Review
April Cools' Club
How to Get Better at Guitar
April Cools' Club
Celebration of Sunshine
April Cools' Club
Personal Mineclonia World Tour
April Cools' Club
Spoken Latin
April Cools' Club
Language-learning anecdotes
April Cools' Club
This music seems to be in the air...
April Cools' Club
You should buy a meat slicer
April Cools' Club
Does Baby Have Hat
April Cools' Club
Dries van Noten in Five Looks
April Cools' Club
My coffee setup
April Cools' Club
My Adidas
April Cools' Club
Find joy in the boring bits of life
April Cools' Club
The Paris of our dreams
April Cools' Club
XORry Not Sorry: The Most Amusing Security Flaws I've Discovered
April Cools' Club
Gamer Games for Non-Gamers
April Cools' Club
Overengineering an Obsidian dashboard to get better at Marvel Snap
April Cools' Club
Leveraging Spaced Repetition to Power My Weekly Newsletter
April Cools' Club
nimi sin
April Cools' Club
Impulse Purchases
April Cools' Club
Egg mayo sandwich optimisation
April Cools' Club
Dynamic Graphs
April Cools' Club
How to Run a Table Top Roleplaying Meetup
April Cools' Club
A rough review of Capers Jones' Applied Software Measurement
April Cools' Club
Come non ho riparato la lavatrice che si riempiva d’acqua da spenta
April Cools' Club
What's the yield on my stonks
April Cools' Club
Tout le monde déteste l'IA
April Cools' Club
A New Hope
April Cools' Club
The WiFi only works when it's raining
April Cools' Club
Choir rehearsal score locations
April Cools' Club
Some easy recipes
April Cools' Club
The Tale of Daniel
April Cools' Club
How I became a gardener
April Cools' Club
Books, Games and Movies
April Cools' Club
Decaf is good, actually
April Cools' Club
Yeah, I Skate(board)
April Cools' Club
A tour of my screenshots folder
April Cools' Club
Kratky in the basement
April Cools' Club
Takerufuji made history
April Cools' Club
Adaptive Plasticity and Life History Theory
April Cools' Club
Unusual Tips for Parenting Toddlers
April Cools' Club
Can it Creami?
April Cools' Club
the saga of Nat
April Cools' Club
Making crochet cacti
April Cools' Club
The Spice Didn't Always Flow
April Cools' Club
Discovering coffee in Toulouse?
April Cools' Club
Mediocrity can be a sign of excellence
April Cools' Club
Ten weird things you can buy online (and why you would)
April Cools' Club
Simple chicken rice
April Cools' Club
We're Knot Friends
April Cools' Club
What's in a username?
April Cools' Club
100 Incredible Tofu Recipes
April Cools' Club
The right tempo for renaissance polyphony
April Cools' Club
Marathon food
April Cools' Club
I ❤️ Microscopes
April Cools' Club
Cocktails
April Cools' Club
To ace exams, practice the easy questions
April Cools' Club
On Error
April Cools' Club
Midaregami
April Cools' Club
Ubj gb ernq EBG13 (How to read ROT13)
April Cools' Club
You Should Charge More
April Cools' Club
Coffee and Me: A Seven Year Love Affair
April Cools' Club
Vihaan tekoälyä
moser’s frame shop
Я ненавижу ИИ
moser’s frame shop
Je hais l’IA.
moser’s frame shop
Odio la IA
moser’s frame shop
Odio l’IA
moser’s frame shop
The Kirby Frame
moser’s frame shop
Eu Odeio IA
moser’s frame shop
I Am An AI Hater
moser’s frame shop
Life During Class Wartime
ongoing by Tim Bray
Corey’s Captives
ongoing by Tim Bray
Spring Evening
ongoing by Tim Bray
Password Manager Angst
ongoing by Tim Bray
Long Links
ongoing by Tim Bray
Nash Burns Saves the Day
ongoing by Tim Bray
Pure Sound Please
ongoing by Tim Bray
Because Algospeak
ongoing by Tim Bray
Kansas and AI
ongoing by Tim Bray
Crocuses of 2026
ongoing by Tim Bray
Open Source and GenAI?
ongoing by Tim Bray
Quamina + Claude, Case 2
ongoing by Tim Bray
Quamina + Claude, Case 1
ongoing by Tim Bray
Long Links
ongoing by Tim Bray
Quamina v2.0.0
ongoing by Tim Bray
Losing 1½ Million Lines of Go
ongoing by Tim Bray
Regexp Lessons
ongoing by Tim Bray
Humanist Plumbing
ongoing by Tim Bray
After the Bubble
ongoing by Tim Bray
Tracy Numbers
ongoing by Tim Bray
Hearts and Minds: An Ambivalent Review of “Project Hail Mary”
No Moods, Ads or Cutesy Fucking Icons
Periscope Depth
No Moods, Ads or Cutesy Fucking Icons
The Plur1bus Solution
No Moods, Ads or Cutesy Fucking Icons
Siren Songs
No Moods, Ads or Cutesy Fucking Icons
No Obituary. Just an End.
No Moods, Ads or Cutesy Fucking Icons
It Awaits Your Experiments.
No Moods, Ads or Cutesy Fucking Icons
A Synopsis of Squid
No Moods, Ads or Cutesy Fucking Icons
Beautiful Things.
No Moods, Ads or Cutesy Fucking Icons
Perplexity: Hail Mary
No Moods, Ads or Cutesy Fucking Icons
Outtake
No Moods, Ads or Cutesy Fucking Icons
Hope for the New Year.
No Moods, Ads or Cutesy Fucking Icons
Born in Pain and Sweat and Pee: the 2024 Gallery Update
No Moods, Ads or Cutesy Fucking Icons
“The Pilot Enters the Core.
No Moods, Ads or Cutesy Fucking Icons
Ass Man.
No Moods, Ads or Cutesy Fucking Icons
Meet the New Boss. Same as the Old Boss.
No Moods, Ads or Cutesy Fucking Icons
Some People Just Want to Watch the Internet Burn.
No Moods, Ads or Cutesy Fucking Icons
The Three-Bragger Problem
No Moods, Ads or Cutesy Fucking Icons
Two-Step Forwards, Ten Years Back
No Moods, Ads or Cutesy Fucking Icons
Alevtina and Tamara and Lyonka, Oh My!
No Moods, Ads or Cutesy Fucking Icons
Meet the New Boss. Same as the Old Boss.
No Moods, Ads or Cutesy Fucking Icons
macOS Terminal - still missing the mark Apple!
/dev/dump
Golang sync.Cond vs. Channel...
/dev/dump
Go modules, so much promise, so much busted
/dev/dump
Letter to Duncan Hunter (Immigration)
/dev/dump
Self Publishing Lessons
/dev/dump
Altering the deal... again....
/dev/dump
Not Abandoning GitHub *yet*
/dev/dump
Microsoft Buying GitHub Would be Bad
/dev/dump
No, Nanomsg is NOT dead
/dev/dump
Why I'm Boycotting Crypto Currencies
/dev/dump
Small Business Accounting Software Woes
/dev/dump
TLS close-notify .... what were they thinking?
/dev/dump
CMake ExternalProject_add In Libraries
/dev/dump
Licensing... again....
/dev/dump
MacOS X Mystery (Challenge)
/dev/dump
Security Advice to IoT Firmware Engineers
/dev/dump
Microsoft Hates My Name (Not Me, Just My Name)
/dev/dump
Leaving github
/dev/dump
Stepping Down
/dev/dump
What Microsoft Can Do to Make Me Hate Windows a Little Less
/dev/dump
On Misunderstandings
/dev/dump
A Space Shooter in Curses
/dev/dump
Fun with terminals, character sets, Unicode, and Go
/dev/dump
Tcell - Terminal functionality for Pure Go apps
/dev/dump
On Go, Portability, and System Interfaces
/dev/dump
Elevation Correction
Alex Harsányi
A Racket Array Tutorial
Alex Harsányi
Pumpkin Plot
Alex Harsányi
The Wolf, the Goat, and the Cabbage
Alex Harsányi
Timezone Lookup Revisited
Alex Harsányi
Synchronizing FIT files using a Raspberry Pi
Alex Harsányi
Heat Maps Revisited
Alex Harsányi
Asteroids (Gameplay)
Alex Harsányi
Asteroids (Game Engine)
Alex Harsányi
Screenshots
Alex Harsányi
Who Owns the Fish?
Alex Harsányi
Shaded Area Plot
Alex Harsányi
Box and Whiskers Plot
Alex Harsányi
Climb Analysis Tool
Alex Harsányi
Plot Animations
Alex Harsányi
Space Invaders
Alex Harsányi
Rendering the World Map Using the Racket Plot Package
Alex Harsányi
Barometric Altitude Measurement
Alex Harsányi
Automating Tests for the Plot Package
Alex Harsányi
Ishido
Alex Harsányi
Markdown View using the Racket editor%
Alex Harsányi
Dependency Management in Racket Applications
Alex Harsányi
Threshold Analysis in ActivityLog2
Alex Harsányi
A Game of Tetris (user interface)
Alex Harsányi
A Game of Tetris (gameplay)
Alex Harsányi
Dual Axis Plots
Alex Harsányi
Custom Rackunit Test Runner
Alex Harsányi
Timezone Aware Local Time
Alex Harsányi
Interactive Heat Maps
Alex Harsányi
Racket Binary Packages
Alex Harsányi
Interactive Maps in the DrRacket REPL
Alex Harsányi
More Timezone Lookup (loading and saving data)
Alex Harsányi
Timezone Lookup (an adventure in program optimization)
Alex Harsányi
Timezone Visualization
Alex Harsányi
Build Racket Packages with Azure Pipelines
Alex Harsányi
Building a GUI Application for the Password Generator
Alex Harsányi
Writing a Simple Password Generator in Racket
Alex Harsányi
An Overview of Common Racket Data Structures
Alex Harsányi
Building a Data Visualization Dashboard in Racket
Alex Harsányi
An enhanced text-field% GUI control for Racket
Alex Harsányi
Chess Game Using Racket's Pasteboard (part 3)
Alex Harsányi
Chess Game Using Racket's Pasteboard (part 2)
Alex Harsányi
Chess Game Using Racket's Pasteboard
Alex Harsányi
Racket Data Frame Package
Alex Harsányi
A Racket GUI Widget to display maps based on OpenStreetMap tiles
Alex Harsányi
Running and Cycling Workout Editor
Alex Harsányi
Arduino 433Mhz Receiver -- Reading Keyfobs
Alex Harsányi
Interactive Overlays With the Racket Plot Package -- Update
Alex Harsányi
Arduino Inclinometer Improvements
Alex Harsányi
Interactive Overlays With the Racket Plot Package
Alex Harsányi
Changing Built-in Racket Packages
Alex Harsányi
Equipment Usage and Costs
Alex Harsányi
Running and Outdoor Temperature
Alex Harsányi
Arduino Inclinometer
Alex Harsányi
Fatigue and Running Form
Alex Harsányi
Quantifying Fatigue
Alex Harsányi
Bike Trainer
Alex Harsányi
Marathon Training 2017 Statistics
Alex Harsányi
Introducing ActivityLog2
Alex Harsányi
Making myself uncomfortable again
Andreas Kling
MutexProtected: A C++ Pattern for Easier Concurrency
Andreas Kling
Excellence is a habit, but so is failure
Andreas Kling
How SerenityOS declares ssize_t
Andreas Kling
15 Minutes Every Day
Andreas Kling
How I make a living working on SerenityOS
Andreas Kling
Ladybird: A new cross-platform browser project
Andreas Kling
Memory safety for SerenityOS
Andreas Kling
I quit my job to focus on SerenityOS full time
Andreas Kling
Smarter C/C++ inlining with __attribute__((flatten))
Andreas Kling
X84 Telnet Server
BogBoa
New Development Server
BogBoa
For the Love of Coffee, Gadgets, and Python
BogBoa
Desktop Linux
BogBoa
Remove the "close window?" prompt from Gnome-Terminal
BogBoa
The Web Client
BogBoa
Unpacking WebSocket Frames Cont.
BogBoa
Unpacking a WebSocket Frame
BogBoa
WebSocket RFC 6455 Handshake
BogBoa
WebSockets
BogBoa
Zomborgs
BogBoa
Failure is an Option
BogBoa
Character Work
BogBoa
Thoughts on Serialization
BogBoa
Crawling a Graph
BogBoa
Visualizing Data
BogBoa
The Universe is a Diamond
BogBoa
Python 3
BogBoa
WebSockets
BogBoa
Chrome WebSocket Protocol Update
BogBoa
HTTP Server
BogBoa
Netboa
BogBoa
The Plague that struck Azeroth
BogBoa
Not Actually Dead
BogBoa
Thinking about Miniboa 2.0
BogBoa
The Cold War Roots of the African Swine Flu Plague
China Matters
Wormwood and Gall: The Frank Olson Story that Errol Morris Missed
China Matters
The Trillion-Dollar Grift: The Long-Term Plan for US-China Decoupling
China Matters
The Crimes of Lola Montes
China Matters
China and the Libyan Muddle and Why Qaddafi Went Down
China Matters
80 Years of Injustice: The Joint, Serial, and Ongoing Betrayal of Korea by the United States and Japan
China Matters
Coddling Japan and Coveting Okinawa: Kennan and MacArthur set the course of North Asian history post-World War II
China Matters
America’s Blueprint for War in the South China Sea
China Matters
What I Witnessed in 1989 in Beijing
China Matters
“Vice”, Dick Cheney’s Ghost, and the Lies of America's Team China War
China Matters
October 2018 Taiwan Mainland Affairs Council Public Opinion Polling on Cross Strait Relations
China Matters
Debunking the China Debt Trap Myth, Sri Lanka/Hambantota Edition
China Matters
The Twelve Days of Christmas...and Elvis
China Matters
Joseph Trento's Report on the Pivotal US Role in Creating Japan's Plutonium Stockpile
China Matters
Posited link between Zhang Shoucheng suicide and Meng Wanzhou arrest (in Chinese)
China Matters
Sri Lanka, Rajapaksa--and China--Back in Geopolitical Play
China Matters
August 2018 Republic of China Mainland Affairs Council Survey of Popular Attitudes on Cross Strait Relations
China Matters
"Little Reunion": Eileen Chang gets another turn in the revisionist meatgrinder
China Matters
Hey! What About Term Limits for the Chinese Communist Party, Xi Jinping??
China Matters
Who Lost China? The Secret War Between Hillary Clinton and Barack Obama
China Matters
Atomic I/O letters column #164
Dan's Data
Atomic I/O letters column #163
Dan's Data
Atomic I/O letters column #162
Dan's Data
Atomic I/O letters column #161
Dan's Data
Atomic I/O letters column #160
Dan's Data
Atomic I/O letters column #159
Dan's Data
Atomic I/O letters column #158
Dan's Data
Atomic I/O letters column #157
Dan's Data
Atomic I/O letters column #156
Dan's Data
Atomic I/O letters column #155
Dan's Data
Atomic I/O letters column #154
Dan's Data
Atomic I/O letters column #153
Dan's Data
Atomic I/O letters column #152
Dan's Data
A comforting lie
Dan's Data
Of course you'd download a car. Or a gun!
Dan's Data
Atomic I/O letters column #151
Dan's Data
Atomic I/O letters column #150
Dan's Data
Atomic I/O letters column #149
Dan's Data
Atomic I/O letters column #148
Dan's Data
Atomic I/O letters column #147
Dan's Data
Money for nothing
Dan's Data
Atomic I/O letters column #146
Dan's Data
Atomic I/O letters column #145
Dan's Data
I get letters
Dan's Data
Random... ish... numbers
Dan's Data
Righteous bits
Dan's Data
Atomic I/O letters column #144
Dan's Data
Science versus SoftRAM
Dan's Data
Seeing past the normal
Dan's Data
Atomic I/O letters column #143
Dan's Data
Atomic I/O letters column #142
Dan's Data
Atomic I/O letters column #141
Dan's Data
On the h4xx0ring of p4sswordZ
Dan's Data
Warfare. Aliens. Car crashes. ENTERTAINMENT!
Dan's Data
Atomic I/O letters column #140
Dan's Data
Review: Noontec GigaLink N5 network storage box
Dan's Data
Atomic I/O letters column #139
Dan's Data
Socialised entertainment
Dan's Data
Atomic I/O letters column #138
Dan's Data
Boing!
Dan's Data
Identical voices and phantom swords
Dan's Data
Atomic I/O letters column #137
Dan's Data
Review: MC Saite MC-086 mouse
Dan's Data
Atomic I/O letters column #136
Dan's Data
Atomic I/O letters column #135
Dan's Data
If it looks random, it probably isn't
Dan's Data
A deadly mouse trap
Dan's Data
Atomic I/O letters column #134
Dan's Data
Pathfinding to everywhere
Dan's Data
15.16 thousand megabytes per dollar
Dan's Data
Grinding myself down
Dan's Data
Dan's Data letters #210
Dan's Data
Atomic I/O letters column #133
Dan's Data
Stomp, don't sprint!
Dan's Data
Atomic I/O letters column #132
Dan's Data
Review: Miyabi 613 hunting knife
Dan's Data
Atomic I/O letters column #131
Dan's Data
Welcome to my museum
Dan's Data
Atomic I/O letters column #130
Dan's Data
Welcome to dreamland
Dan's Data
Atomic I/O letters column #129
Dan's Data
When you have eliminated the impossible...
Dan's Data
Of magic lanterns, and MMORPGs
Dan's Data
The death of the manual
Dan's Data
Review: PCsensor FS1_P USB foot switch
Dan's Data
Atomic I/O letters column #128
Dan's Data
Dan's Data letters #209
Dan's Data
Atomic I/O letters column #127
Dan's Data
Atomic I/O letters column #126
Dan's Data
Filenames.WTF
Dan's Data
Atomic I/O letters column #125
Dan's Data
In Praise of the Fisheye
Dan's Data
Atomic I/O letters column #124
Dan's Data
A modest censorship proposal
Dan's Data
Atomic I/O letters column #123
Dan's Data
Atomic I/O letters column #122
Dan's Data
Stuck in the foothills
Dan's Data
Atomic I/O letters column #121
Dan's Data
The newt hits! You die...
Dan's Data
Have you wasted enough time today?
Dan's Data
Atomic I/O letters column #120
Dan's Data
Atomic I/O letters column #119
Dan's Data
Big Brother is watching you play
Dan's Data
Dan's Data letters #208
Dan's Data
Dan's Data letters #207
Dan's Data
One-note NPCs
Dan's Data
Cannibalise the corpses!
Dan's Data
Atomic I/O letters column #118
Dan's Data
Atomic I/O letters column #117
Dan's Data
Atomic I/O letters column #116
Dan's Data
Five trillion bits flying in loose formation
Dan's Data
Atomic I/O letters column #115
Dan's Data
Game crazy
Dan's Data
Alt-tCRASH
Dan's Data
Speed kings
Dan's Data
The daily grind
Dan's Data
Rustfmt-ing Rust
Featherweight Musings
My Git and GitHub work flow
Featherweight Musings
rustfmt - call for contributions
Featherweight Musings
Contributing to Rust
Featherweight Musings
New tutorial - arrays and vectors in Rust
Featherweight Musings
Graphs in Rust
Featherweight Musings
Creating a drop-in replacement for the Rust compiler
Featherweight Musings
Recent syntactic changes to Rust
Featherweight Musings
My thoughts on Rust in 2015
Featherweight Musings
rustaceans.org
Featherweight Musings
Notes on training for sport
Featherweight Musings
Thoughts on numeric types
Featherweight Musings
A gotcha with raw pointers and unsafe code
Featherweight Musings
LibHoare - pre- and postconditions in Rust
Featherweight Musings
Rust for C++ programmers - part 9: destructuring pt2 - match and borrowing
Featherweight Musings
Rust for C++ programmers - part 8: destructuring
Featherweight Musings
Rust for C++ programmers - part 7: data types
Featherweight Musings
Rust for C++ programmers - part 6: Rc, Gc, and * pointers
Featherweight Musings
Rust for C++ programmers - part 5: borrowed references
Featherweight Musings
A thought on language design
Featherweight Musings
Rust for C++ programmers - part 4: unique pointers
Featherweight Musings
Rust for C++ programmers - part 3: primitive types and operators
Featherweight Musings
Formatting change
Featherweight Musings
Rust for C++ programmers - part 2: control flow
Featherweight Musings
Rust for C++ programmers - an intermission - why Rust
Featherweight Musings
Cosa è Andato al Prada Doppio Club di Miami
greg.org: the making of
Rothko & Parsons At The National Gallery, Curated By Bunny Mellon
greg.org: the making of
Better Read #018: Ellsworth Kelly, Notes of 1969
greg.org: the making of
I Found An Object And Presented It As Itself Alone
greg.org: the making of
Our Guernica Cycle - EB-5, 05.06.2017
greg.org: the making of
Untitled (Mnuchin Gallery), 2017?
greg.org: the making of
Three Charts Presented In Order Of Increasing Credibility
greg.org: the making of
Ellsworth Kelly Dancing Monkey
greg.org: the making of
UPDATE: Our Guernica Cycle - Ivanka / Merkel 03.17.2017
greg.org: the making of
Talking Walter Hopps, Ferus, & LA with Anne Doran & Deborah Treisman, 10/29 @Alden Projects
greg.org: the making of
Better Read #017: Embroidery Trouble Shooting Guide
greg.org: the making of
Tommy Hilfiger Capo Personale
greg.org: the making of
Untitled (Presidential Seal), 2017
greg.org: the making of
Statement-As-Question: How Do You Get Here? From How Is Art History Made?
greg.org: the making of
RIP Vern Blosum
greg.org: the making of
Untitled (We Privatized All Of Versailles), 2017
greg.org: the making of
Untitled (Mnuchin Gallery), 2017
greg.org: the making of
Untitled (Boxwood Maze), 1967/2017
greg.org: the making of
Erased Kassay JPEG
greg.org: the making of
Ruth Asawa BMC Laundry Stamp Drawings
greg.org: the making of
Rust and dynamically-sized thin pointers
John Millikin
vu128: Efficient variable-length integers
John Millikin
Creating TUN/TAP interfaces in Linux
John Millikin
Running SunOS 4 in QEMU (SPARC)
John Millikin
Improved UNIX socket networking in QEMU 7.2
John Millikin
Debugging Win32 binaries in Ghidra via Wine
John Millikin
Running BeOS 5 in QEMU (i386)
John Millikin
Gmail accepts forged YouTube emails
John Millikin
Compacting Lunr search indices
John Millikin
JSON is not a YAML subset
John Millikin
Stateless Kubernetes overlay networks with IPv6
John Millikin
Extending VSCode with WebAssembly
John Millikin
Notes on cross-compiling Rust
John Millikin
First impressions of Rust
John Millikin
Commentary on “Stop Using Encrypted Email”
John Millikin
By any other CNAME
John Millikin
SRE School: No Haunted Forests
John Millikin
(More) Effective Go
John Millikin
Error Beneath the WAVs
John Millikin
Why I Ripped The Same CD 300 Times
John Millikin
Effective gRPC
John Millikin
Bazel School: Toolchains
John Millikin
Mojibake in Surugaya Javascript
John Millikin
UNIX Syscalls
John Millikin
SRE School: Health Checking
John Millikin
Reddit Front Page (2018)
John Millikin
Re:Creators Episode 21
John Millikin
SRE School: Instrumentation
John Millikin
haskell-cpython: Calling Python libraries from Haskell
John Millikin
Monad is not difficult
John Millikin
Understanding Iteratees
John Millikin
Replay
NSHipster
Manim
NSHipster
@isolated(any)
NSHipster
Uncertain⟨T⟩
NSHipster
Model Context Protocol (MCP)
NSHipster
Ollama
NSHipster
op run
NSHipster
As We May Code
NSHipster
WWDC 2020
NSHipster
Language Server Protocol
NSHipster
So Long, Prog21
Programming in the 21st Century
Writing Video Games in a Functional Style
Programming in the 21st Century
Progress Bars are Surprisingly Difficult
Programming in the 21st Century
So Long, Prog21
Programming in the 21st Century
Writing Video Games in a Functional Style
Programming in the 21st Century
Progress Bars are Surprisingly Difficult
Programming in the 21st Century
Rovyvon A5R flashlight diagram
Push.cx
TypeID in Lua
Push.cx
Broken Poker
Push.cx
TV Setup
Push.cx
Google Ad Injection
Push.cx
Streaming Weekly Lobsters Office Hours
Push.cx
Discord vs IRC Rough Notes
Push.cx
Wrapping Large-Scale Refactors
Push.cx
NixOS on prgmr and Failing to Learn Nix
Push.cx
House Rules
Push.cx
***
Rafał Pastuszak
Hummingbirds are Evil! Procrastination, Laziness and Play
Rafał Pastuszak
Short: WiP
Rafał Pastuszak
Sit.
Rafał Pastuszak
Emotive Conjugation
Rafał Pastuszak
useRainbow()
Rafał Pastuszak
No Such Thing as a Fish
Rafał Pastuszak
Short: Retrofuturetrospectives
Rafał Pastuszak
Code sober, debug drunk
Rafał Pastuszak
Pair Programming with Snakes
Rafał Pastuszak
Reactive Hole
Rafał Pastuszak
Come and say hi
Rafał Pastuszak
Ensō
Rafał Pastuszak
I want a good parallel computer
Raph Levien’s blog
A note on Metal shader converter
Raph Levien’s blog
Simplifying Bézier paths
Raph Levien’s blog
Moving from Rust to C++
Raph Levien’s blog
Requiem for piet-gpu-hal
Raph Levien’s blog
Raph’s reflections and wishes for 2023
Raph Levien’s blog
Minikin retrospective
Raph Levien’s blog
Parallel curves of cubic Béziers
Raph Levien’s blog
Advice for the next dozen Rust GUIs
Raph Levien’s blog
Xilem: an architecture for UI in Rust
Raph Levien’s blog
Add SSL to your personal website
Seth Ladd's Blog
Dynamically load package contents with Dart's new Resource class
Seth Ladd's Blog
New Dart SDK helps eliminates symlinks
Seth Ladd's Blog
Null-aware operators in Dart
Seth Ladd's Blog
Formatting Dart code before every git commit
Seth Ladd's Blog
I ported a JavaScript app to Dart. Here's what I learned.
Seth Ladd's Blog
Speed Up Your Dart App's Initial Load With This Transformer
Seth Ladd's Blog
Angular and Polymer Data Binding, Together!
Seth Ladd's Blog
How to shrink the size of your Dart app when compiled to JavaScript
Seth Ladd's Blog
Compile-time dead code elimination with dart2js
Seth Ladd's Blog
Forms, HTTP servers, and Polymer with Dart
Seth Ladd's Blog
JavaZone Report. Spoiler: Awesome.
Seth Ladd's Blog
You complete me, unless you already have a Dart future
Seth Ladd's Blog
Polymer and Dart: A First Look
Seth Ladd's Blog
Two-way data binding with Web UI custom elements and models
Seth Ladd's Blog
Dart and Sencha Touch for Mobile Web Apps
Seth Ladd's Blog
Create unified interfaces across dart:io and dart:html
Seth Ladd's Blog
Call JavaScript from Dart - First Look
Seth Ladd's Blog
Forms, HTTP servers, and Web Components with Dart
Seth Ladd's Blog
Watch the video from What's New in Dart from Google I/O 2013
Seth Ladd's Blog
Lazy Load Libraries in Dart
Seth Ladd's Blog
Dynamically Load Code with Dart
Seth Ladd's Blog
6 Dart FAQs - Answered!
Seth Ladd's Blog
First Look at Dart Mixins
Seth Ladd's Blog
Dart on FLOSS Weekly from TWiT Network
Seth Ladd's Blog
Neocities Is Blocked by Bing
The Neocities Blog
Cleaner links for web pages
The Neocities Blog
File moving/renaming, faster web sites
The Neocities Blog
IPFS DNS Support
The Neocities Blog
Introducing the Neocities CLI
The Neocities Blog
The Net Neutrality Supporters Plan
The Neocities Blog
10x more free space
The Neocities Blog
Introducing Neocities Site Tipping
The Neocities Blog
We’re switching to default SSL
The Neocities Blog
HTTP is obsolete. It’s time for the Distributed Web
The Neocities Blog
Comet Ice
what if?
Star Ownership
what if?
Transatlantic Car Rental
what if?
Hailstones
what if?
Hot Banana
what if?
righteous animation
Cyberdelia NYC
RIP Dan McQuade
Cyberdelia NYC
the joybubbles of connecting
Cyberdelia NYC
party like it’s 1995
Cyberdelia NYC
just thought I would let you guys at 2600 know about this
Cyberdelia NYC
30 Years
Cyberdelia NYC
Cyberdelia t-shirts
Cyberdelia NYC
She Doesn’t Even Go Here!… oh wait, yes she does
Cyberdelia NYC
New York Times Archive webpage - August 10th 1988
Cyberdelia NYC
Kate Was Right About Computers
Cyberdelia NYC
30
Dioramas
29
Dioramas
28
Dioramas
27
Dioramas
26
Dioramas
25
Dioramas
24
Dioramas
23
Dioramas
22
Dioramas
21
Dioramas
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
untitled
renee french
The Clues Come Together in the Chaos (WUBRG Drafting)
Mediocre Magic 2026-05-15T01:21:48+00:00
Recency Bias
Vittles 2026-05-15T08:07:11+00:00
I am your dentist (mathemachicken)
Warp Door 2026-05-15T10:37:02+00:00
That’s So Cincinnati
Midwesterner 2026-05-15T11:01:26+00:00
untitled
https://jennifermillsnews.tumblr.com/ 2026-05-15T12:28:41+00:00
pennis (Sagun, Denny)
Warp Door 2026-05-15T12:54:36+00:00
New and Old #266
The Deleted Scenes 2026-05-15T12:56:05+00:00
The chopped Gym Shoe
Food is Stupid 2026-05-15T13:00:56+00:00
untitled
HORSEPUSSY GALORE 2026-05-15T14:13:46+00:00
untitled
HORSEPUSSY GALORE 2026-05-15T14:16:17+00:00
untitled
HORSEPUSSY GALORE 2026-05-15T14:18:19+00:00
untitled
HORSEPUSSY GALORE 2026-05-15T14:47:39+00:00
The Dragon’s Eye
FYFD 2026-05-15T15:00:00+00:00
Premium: What If...We're In An AI Bubble? (Part 1)
Ed Zitron's Where's Your Ed At 2026-05-15T16:44:27+00:00
CONDOR
Weird Fucking Games 2026-05-15T18:16:25+00:00
Filtered for bad addresses and good emotions
Interconnected 2026-05-15T19:10:00+00:00

Fsyncgate: errors on fsync are unrecovarable

Dan Luu

This is an archive of the original "fsyncgate" email thread. This is posted here because I wanted to have a link that would fit on a slide for <a href="http://danluu.com/deconstruct-files/">a talk on file safety</a> with <a href="http://danluu.com/web-bloat/">a mobile-friendly non-bloated format</a>. <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Subject:Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS Date:2018-03-28 02:23:46 </code></pre> Hi all Some time ago I ran into an issue where a user encountered data corruption after a storage error. PostgreSQL played a part in that corruption by allowing checkpoint what should've been a fatal error. TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at least on Linux. When fsync() returns success it means "all writes since the last fsync have hit disk" but we assume it means "all writes since the last SUCCESSFUL fsync have hit disk". Pg wrote some blocks, which went to OS dirty buffers for writeback. Writeback failed due to an underlying storage error. The block I/O layer and XFS marked the writeback page as failed (AS_EIO), but had no way to tell the app about the failure. When Pg called fsync() on the FD during the next checkpoint, fsync() returned EIO because of the flagged page, to tell Pg that a previous async write failed. Pg treated the checkpoint as failed and didn't advance the redo start position in the control file. All good so far. But then we retried the checkpoint, which retried the fsync(). The retry succeeded, because the prior fsync() cleared the AS_EIO bad page flag. The write never made it to disk, but we completed the checkpoint, and merrily carried on our way. Whoops, data loss. The clear-error-and-continue behaviour of fsync is not documented as far as I can tell. Nor is fsync() returning EIO unless you have a very new linux man-pages with the patch I wrote to add it. But from what I can see in the POSIX standard we are not given any guarantees about what happens on fsync() failure at all, so we're probably wrong to assume that retrying fsync( ) is safe. If the server had been using ext3 or ext4 with errors=remount-ro, the problem wouldn't have occurred because the first I/O error would've remounted the FS and stopped Pg from continuing. But XFS doesn't have that option. There may be other situations where this can occur too, involving LVM and/or multipath, but I haven't comprehensively dug out the details yet. It proved possible to recover the system by faking up a backup label from before the first incorrectly-successful checkpoint, forcing redo to repeat and write the lost blocks. But ... what a mess. I posted about the underlying fsync issue here some time ago: <a href="https://stackoverflow.com/q/42434872/398670">https://stackoverflow.com/q/42434872/398670</a> but haven't had a chance to follow up about the Pg specifics. I've been looking at the problem on and off and haven't come up with a good answer. I think we should just PANIC and let redo sort it out by repeating the failed write when it repeats work since the last checkpoint. The API offered by async buffered writes and fsync offers us no way to find out which page failed, so we can't just selectively redo that write. I think we do know the relfilenode associated with the fd that failed to fsync, but not much more. So the alternative seems to be some sort of potentially complex online-redo scheme where we replay WAL only the relation on which we had the fsync() error, while otherwise servicing queries normally. That's likely to be extremely error-prone and hard to test, and it's trying to solve a case where on other filesystems the whole DB would grind to a halt anyway. I looked into whether we can solve it with use of the AIO API instead, but the mess is even worse there - from what I can tell you can't even reliably guarantee fsync at all on all Linux kernel versions. We already PANIC on fsync() failure for WAL segments. We just need to do the same for data forks at least for EIO. This isn't as bad as it seems because AFAICS fsync only returns EIO in cases where we should be stopping the world anyway, and many FSes will do that for us. There are rather a lot of pg_fsync() callers. While we could handle this case-by-case for each one, I'm tempted to just make pg_fsync() itself intercept EIO and PANIC. Thoughts? <hr /> <pre><code>From:Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Date:2018-03-28 03:53:08 </code></pre> Craig Ringer writes: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. </blockquote> Surely you jest. <blockquote> Retrying fsync() is not OK at least on Linux. When fsync() returns success it means "all writes since the last fsync have hit disk" but we assume it means "all writes since the last SUCCESSFUL fsync have hit disk". </blockquote> If that's actually the case, we need to push back on this kernel brain damage, because as you're describing it fsync would be completely useless. Moreover, POSIX is entirely clear that successful fsync means all preceding writes for the file have been completed, full stop, doesn't matter when they were issued. <hr /> <pre><code>From:Michael Paquier <michael(at)paquier(dot)xyz> Date:2018-03-29 02:30:59 </code></pre> On Tue, Mar 27, 2018 at 11:53:08PM -0400, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. </blockquote> Surely you jest. </blockquote> Any callers of pg_fsync in the backend code are careful enough to check the returned status, sometimes doing retries like in mdsync, so what is proposed here would be a regression. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-03-29 02:48:27 </code></pre> On Thu, Mar 29, 2018 at 3:30 PM, Michael Paquier wrote: <blockquote> On Tue, Mar 27, 2018 at 11:53:08PM -0400, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. </blockquote> Surely you jest. </blockquote> Any callers of pg_fsync in the backend code are careful enough to check the returned status, sometimes doing retries like in mdsync, so what is proposed here would be a regression. </blockquote> Craig, is the phenomenon you described the same as the second issue "Reporting writeback errors" discussed in this article? <a href="https://lwn.net/Articles/724307/">https://lwn.net/Articles/724307/</a> "Current kernels might report a writeback error on an fsync() call, but there are a number of ways in which that can fail to happen." That's... I'm speechless. <hr /> <pre><code>From:Justin Pryzby <pryzby(at)telsasoft(dot)com> Date:2018-03-29 05:00:31 </code></pre> On Thu, Mar 29, 2018 at 11:30:59AM +0900, Michael Paquier wrote: <blockquote> On Tue, Mar 27, 2018 at 11:53:08PM -0400, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. </blockquote> Surely you jest. </blockquote> Any callers of pg_fsync in the backend code are careful enough to check the returned status, sometimes doing retries like in mdsync, so what is proposed here would be a regression. </blockquote> The retries are the source of the problem ; the first fsync() can return EIO, and also clears the error causing a 2nd fsync (of the same data) to return success. (Note, I can see that it might be useful to PANIC on EIO but retry for ENOSPC). On Thu, Mar 29, 2018 at 03:48:27PM +1300, Thomas Munro wrote: <blockquote> Craig, is the phenomenon you described the same as the second issue "Reporting writeback errors" discussed in this article? <a href="https://lwn.net/Articles/724307/">https://lwn.net/Articles/724307/</a> </blockquote> Worse, the article acknowledges the behavior without apparently suggesting to change it: "Storing that value in the file structure has an important benefit: it makes it possible to report a writeback error EXACTLY ONCE TO EVERY PROCESS THAT CALLS FSYNC() .... In current kernels, ONLY THE FIRST CALLER AFTER AN ERROR OCCURS HAS A CHANCE OF SEEING THAT ERROR INFORMATION." I believe I reproduced the problem behavior using dmsetup "error" target, see attached. strace looks like this: kernel is Linux 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux <pre><code>1open("/dev/mapper/eio", O_RDWR|O_CREAT, 0600) = 3 2write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 3write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 4write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 5write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 6write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 7write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 8write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 2560 9write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = -1 ENOSPC (No space left on device) 10dup(2) = 4 11fcntl(4, F_GETFL) = 0x8402 (flags O_RDWR|O_APPEND|O_LARGEFILE) 12brk(NULL) = 0x1299000 13brk(0x12ba000) = 0x12ba000 14fstat(4, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0 15write(4, "write(1): No space left on devic"..., 34write(1): No space left on device 16) = 34 17close(4) = 0 18fsync(3) = -1 EIO (Input/output error) 19dup(2) = 4 20fcntl(4, F_GETFL) = 0x8402 (flags O_RDWR|O_APPEND|O_LARGEFILE) 21fstat(4, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0 22write(4, "fsync(1): Input/output error\n", 29fsync(1): Input/output error 23) = 29 24close(4) = 0 25close(3) = 0 26open("/dev/mapper/eio", O_RDWR|O_CREAT, 0600) = 3 27fsync(3) = 0 28write(3, "\0", 1) = 1 29fsync(3) = 0 30exit_group(0) = ? </code></pre> 2: EIO isn't seen initially due to writeback page cache; 9: ENOSPC due to small device 18: original IO error reported by fsync, good 25: the original FD is closed 26: ..and file reopened 27: fsync on file with still-dirty data+EIO returns success BAD 10, 19: I'm not sure why there's dup(2), I guess glibc thinks that perror should write to a separate FD (?) Also note, close() ALSO returned success..which you might think exonerates the 2nd fsync(), but I think may itself be problematic, no? In any case, the 2nd byte certainly never got written to DM error, and the failure status was lost following fsync(). I get the exact same behavior if I break after one write() loop, such as to avoid ENOSPC. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-03-29 05:06:22 </code></pre> On Thu, Mar 29, 2018 at 6:00 PM, Justin Pryzby wrote: <blockquote> The retries are the source of the problem ; the first fsync() can return EIO, and also clears the error causing a 2nd fsync (of the same data) to return success. </blockquote> What I'm failing to grok here is how that error flag even matters, whether it's a single bit or a counter as described in that patch. If write back failed, the page is still dirty. So all future calls to fsync() need to try to try to flush it again, and (presumably) fail again (unless it happens to succeed this time around). <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-03-29 05:25:51 </code></pre> On 29 March 2018 at 13:06, Thomas Munro wrote: <blockquote> On Thu, Mar 29, 2018 at 6:00 PM, Justin Pryzby wrote: <blockquote> The retries are the source of the problem ; the first fsync() can return EIO, and also clears the error causing a 2nd fsync (of the same data) to return success. </blockquote> What I'm failing to grok here is how that error flag even matters, whether it's a single bit or a counter as described in that patch. If write back failed, the page is still dirty. So all future calls to fsync() need to try to try to flush it again, and (presumably) fail again (unless it happens to succeed this time around). <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a> </blockquote> You'd think so. But it doesn't appear to work that way. You can see yourself with the error device-mapper destination mapped over part of a volume. I wrote a test case here. <a href="https://github.com/ringerc/scrapcode/blob/master/testcases/fsync-error-clear.c">https://github.com/ringerc/scrapcode/blob/master/testcases/fsync-error-clear.c</a> I don't pretend the kernel behaviour is sane. And it's possible I've made an error in my analysis. But since I've observed this in the wild, and seen it in a test case, I strongly suspect that's what I've described is just what's happening, brain-dead or no. Presumably the kernel marks the page clean when it dispatches it to the I/O subsystem and doesn't dirty it again on I/O error? I haven't dug that deep on the kernel side. See the stackoverflow post for details on what I found in kernel code analysis. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-03-29 05:32:43 </code></pre> On 29 March 2018 at 10:48, Thomas Munro wrote: <blockquote> On Thu, Mar 29, 2018 at 3:30 PM, Michael Paquier wrote: <blockquote> On Tue, Mar 27, 2018 at 11:53:08PM -0400, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. </blockquote> Surely you jest. </blockquote> Any callers of pg_fsync in the backend code are careful enough to check the returned status, sometimes doing retries like in mdsync, so what is proposed here would be a regression. </blockquote> Craig, is the phenomenon you described the same as the second issue "Reporting writeback errors" discussed in this article? <a href="https://lwn.net/Articles/724307/">https://lwn.net/Articles/724307/</a> </blockquote> A variant of it, by the looks. The problem in our case is that the kernel only tells us about the error once. It then forgets about it. So yes, that seems like a variant of the statement: <blockquote> "Current kernels might report a writeback error on an fsync() call, but there are a number of ways in which that can fail to happen." That's... I'm speechless. </blockquote> Yeah. It's a bit nuts. I was astonished when I saw the behaviour, and that it appears undocumented. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-03-29 05:35:47 </code></pre> On 29 March 2018 at 10:30, Michael Paquier wrote: <blockquote> On Tue, Mar 27, 2018 at 11:53:08PM -0400, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. </blockquote> Surely you jest. </blockquote> Any callers of pg_fsync in the backend code are careful enough to check the returned status, sometimes doing retries like in mdsync, so what is proposed here would be a regression. </blockquote> I covered this in my original post. Yes, we check the return value. But what do we do about it? For fsyncs of heap files, we ERROR, aborting the checkpoint. We'll retry the checkpoint later, which will retry the fsync(). Which will now appear to succeed because the kernel forgot that it lost our writes after telling us the first time. So we do check the error code, which returns success, and we complete the checkpoint and move on. But we only retried the fsync, not the writes before the fsync. So we lost data. Or rather, failed to detect that the kernel did so, so our checkpoint was bad and could not be completed. The problem is that we keep retrying checkpoints without repeating the writes leading up to the checkpoint, and retrying fsync. I don't pretend the kernel behaviour is sane, but we'd better deal with it anyway. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-03-29 05:58:45 </code></pre> On 28 March 2018 at 11:53, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. </blockquote> Surely you jest. </blockquote> No. I'm quite serious. Worse, we quite possibly have to do it for ENOSPC as well to avoid similar lost-page-write issues. It's not necessary on ext3/ext4 with errors=remount-ro, but that's only because the FS stops us dead in our tracks. I don't pretend it's sane. The kernel behaviour is IMO crazy. If it's going to lose a write, it should at minimum mark the FD as broken so no further fsync() or anything else can succeed on the FD, and an app that cares about durability must repeat the whole set of work since the prior succesful fsync(). Just reporting it once and forgetting it is madness. But even if we convince the kernel folks of that, how do other platforms behave? And how long before these kernels are out of use? We'd better deal with it, crazy or no. Please see my StackOverflow post for the kernel-level explanation. Note also the test case link there. <a href="https://stackoverflow.com/a/42436054/398670">https://stackoverflow.com/a/42436054/398670</a> <blockquote> <blockquote> Retrying fsync() is not OK at least on Linux. When fsync() returns success it means "all writes since the last fsync have hit disk" but we assume it means "all writes since the last SUCCESSFUL fsync have hit disk". </blockquote> If that's actually the case, we need to push back on this kernel brain damage, because as you're describing it fsync would be completely useless. </blockquote> It's not useless, it's just telling us something other than what we think it means. The promise it seems to give us is that if it reports an error once, everything after that is useless, so we should throw our toys, close and reopen everything, and redo from the last known-good state. Though as Tomas posted below, it provides rather weaker guarantees than I thought in some other areas too. See that lwn.net article he linked. <blockquote> Moreover, POSIX is entirely clear that successful fsync means all preceding writes for the file have been completed, full stop, doesn't matter when they were issued. </blockquote> I can't find anything that says so to me. Please quote relevant spec. I'm working from <a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html">http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html</a> which states that "The fsync() function shall request that all data for the open file descriptor named by fildes is to be transferred to the storage device associated with the file described by fildes. The nature of the transfer is implementation-defined. The fsync() function shall not return until the system has completed that action or until an error is detected." My reading is that POSIX does not specify what happens AFTER an error is detected. It doesn't say that error has to be persistent and that subsequent calls must also report the error. It also says: "If the fsync() function fails, outstanding I/O operations are not guaranteed to have been completed." but that doesn't clarify matters much either, because it can be read to mean that once there's been an error reported for some IO operations there's no guarantee those operations are ever completed even after a subsequent fsync returns success. I'm not seeking to defend what the kernel seems to be doing. Rather, saying that we might see similar behaviour on other platforms, crazy or not. I haven't looked past linux yet, though. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-03-29 12:07:56 </code></pre> On Thu, Mar 29, 2018 at 6:58 PM, Craig Ringer wrote: <blockquote> On 28 March 2018 at 11:53, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. </blockquote> Surely you jest. </blockquote> No. I'm quite serious. Worse, we quite possibly have to do it for ENOSPC as well to avoid similar lost-page-write issues. </blockquote> I found your discussion with kernel hacker Jeff Layton at <a href="https://lwn.net/Articles/718734/">https://lwn.net/Articles/718734/</a> in which he said: "The stackoverflow writeup seems to want a scheme where pages stay dirty after a writeback failure so that we can try to fsync them again. Note that that has never been the case in Linux after hard writeback failures, AFAIK, so programs should definitely not assume that behavior." The article above that says the same thing a couple of different ways, ie that writeback failure leaves you with pages that are neither written to disk successfully nor marked dirty. If I'm reading various articles correctly, the situation was even worse before his errseq_t stuff landed. That fixed cases of completely unreported writeback failures due to sharing of PG_error for both writeback and read errors with certain filesystems, but it doesn't address the clean pages problem. Yeah, I see why you want to PANIC. <blockquote> <blockquote> Moreover, POSIX is entirely clear that successful fsync means all preceding writes for the file have been completed, full stop, doesn't matter when they were issued. </blockquote> I can't find anything that says so to me. Please quote relevant spec. I'm working from <a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html">http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html</a> which states that "The fsync() function shall request that all data for the open file descriptor named by fildes is to be transferred to the storage device associated with the file described by fildes. The nature of the transfer is implementation-defined. The fsync() function shall not return until the system has completed that action or until an error is detected." My reading is that POSIX does not specify what happens AFTER an error is detected. It doesn't say that error has to be persistent and that subsequent calls must also report the error. It also says: </blockquote> FWIW my reading is the same as Tom's. It says "all data for the open file descriptor" without qualification or special treatment after errors. Not "some". <blockquote> I'm not seeking to defend what the kernel seems to be doing. Rather, saying that we might see similar behaviour on other platforms, crazy or not. I haven't looked past linux yet, though. </blockquote> I see no reason to think that any other operating system would behave that way without strong evidence... This is openly acknowledged to be "a mess" and "a surprise" in the Filesystem Summit article. I am not really qualified to comment, but from a cursory glance at FreeBSD's vfs_bio.c I think it's doing what you'd hope for... see the code near the comment "Failed write, redirty." <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-03-29 13:15:10 </code></pre> On 29 March 2018 at 20:07, Thomas Munro wrote: <blockquote> On Thu, Mar 29, 2018 at 6:58 PM, Craig Ringer wrote: <blockquote> On 28 March 2018 at 11:53, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. </blockquote> Surely you jest. </blockquote> No. I'm quite serious. Worse, we quite possibly have to do it for ENOSPC as well to avoid similar lost-page-write issues. </blockquote> I found your discussion with kernel hacker Jeff Layton at <a href="https://lwn.net/Articles/718734/">https://lwn.net/Articles/718734/</a> in which he said: "The stackoverflow writeup seems to want a scheme where pages stay dirty after a writeback failure so that we can try to fsync them again. Note that that has never been the case in Linux after hard writeback failures, AFAIK, so programs should definitely not assume that behavior." The article above that says the same thing a couple of different ways, ie that writeback failure leaves you with pages that are neither written to disk successfully nor marked dirty. If I'm reading various articles correctly, the situation was even worse before his errseq_t stuff landed. That fixed cases of completely unreported writeback failures due to sharing of PG_error for both writeback and read errors with certain filesystems, but it doesn't address the clean pages problem. Yeah, I see why you want to PANIC. </blockquote> In more ways than one ;) <blockquote> I'm not seeking to defend what the kernel seems to be doing. Rather, saying <blockquote> that we might see similar behaviour on other platforms, crazy or not. I haven't looked past linux yet, though. </blockquote> I see no reason to think that any other operating system would behave that way without strong evidence... This is openly acknowledged to be "a mess" and "a surprise" in the Filesystem Summit article. I am not really qualified to comment, but from a cursory glance at FreeBSD's vfs_bio.c I think it's doing what you'd hope for... see the code near the comment "Failed write, redirty." </blockquote> Ok, that's reassuring, but doesn't help us on the platform the great majority of users deploy on :( "If on Linux, PANIC" Hrm. <hr /> <pre><code>From:Catalin Iacob <iacobcatalin(at)gmail(dot)com> Date:2018-03-29 16:20:00 </code></pre> On Thu, Mar 29, 2018 at 2:07 PM, Thomas Munro wrote: <blockquote> I found your discussion with kernel hacker Jeff Layton at <a href="https://lwn.net/Articles/718734/">https://lwn.net/Articles/718734/</a> in which he said: "The stackoverflow writeup seems to want a scheme where pages stay dirty after a writeback failure so that we can try to fsync them again. Note that that has never been the case in Linux after hard writeback failures, AFAIK, so programs should definitely not assume that behavior." </blockquote> And a bit below in the same comments, to this question about PG: "So, what are the options at this point? The assumption was that we can repeat the fsync (which as you point out is not the case), or shut down the database and perform recovery from WAL", the same Jeff Layton seems to agree PANIC is the appropriate response: "Replaying the WAL synchronously sounds like the simplest approach when you get an error on fsync. These are uncommon occurrences for the most part, so having to fall back to slow, synchronous error recovery modes when this occurs is probably what you want to do.". And right after, he confirms the errseq_t patches are about always detecting this, not more: "The main thing I working on is to better guarantee is that you actually get an error when this occurs rather than silently corrupting your data. The circumstances where that can occur require some corner-cases, but I think we need to make sure that it doesn't occur." Jeff's comments in the pull request that merged errseq_t are worth reading as well: <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750">https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750</a> <blockquote> The article above that says the same thing a couple of different ways, ie that writeback failure leaves you with pages that are neither written to disk successfully nor marked dirty. If I'm reading various articles correctly, the situation was even worse before his errseq_t stuff landed. That fixed cases of completely unreported writeback failures due to sharing of PG_error for both writeback and read errors with certain filesystems, but it doesn't address the clean pages problem. </blockquote> Indeed, that's exactly how I read it as well (opinion formed independently before reading your sentence above). The errseq_t patches landed in v4.13 by the way, so very recently. <blockquote> Yeah, I see why you want to PANIC. </blockquote> Indeed. Even doing that leaves question marks about all the kernel versions before v4.13, which at this point is pretty much everything out there, not even detecting this reliably. This is messy. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-03-29 21:18:14 </code></pre> On Fri, Mar 30, 2018 at 5:20 AM, Catalin Iacob wrote: <blockquote> Jeff's comments in the pull request that merged errseq_t are worth reading as well: <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750">https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750</a> </blockquote> Wow. It looks like there may be a separate question of when each filesystem adopted this new infrastructure? <blockquote> <blockquote> Yeah, I see why you want to PANIC. </blockquote> Indeed. Even doing that leaves question marks about all the kernel versions before v4.13, which at this point is pretty much everything out there, not even detecting this reliably. This is messy. </blockquote> The pre-errseq_t problems are beyond our control. There's nothing we can do about that in userspace (except perhaps abandon OS-buffered IO, a big project). We just need to be aware that this problem exists in certain kernel versions and be grateful to Layton for fixing it. The dropped dirty flag problem is something we can and in my view should do something about, whatever we might think about that design choice. As Andrew Gierth pointed out to me in an off-list chat about this, by the time you've reached this state, both PostgreSQL's buffer and the kernel's buffer are clean and might be reused for another block at any time, so your data might be gone from the known universe -- we don't even have the option to rewrite our buffers in general. Recovery is the only option. Thank you to Craig for chasing this down and +1 for his proposal, on Linux only. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-03-31 13:24:28 </code></pre> On Fri, Mar 30, 2018 at 10:18:14AM +1300, Thomas Munro wrote: <blockquote> <blockquote> <blockquote> Yeah, I see why you want to PANIC. </blockquote> Indeed. Even doing that leaves question marks about all the kernel versions before v4.13, which at this point is pretty much everything out there, not even detecting this reliably. This is messy. </blockquote> </blockquote> There may still be a way to reliably detect this on older kernel versions from userspace, but it will be messy whatsoever. On EIO errors, the kernel will not restore the dirty page flags, but it will flip the error flags on the failed pages. One could mmap() the file in question, obtain the PFNs (via /proc/pid/pagemap) and enumerate those to match the ones with the error flag switched on (via /proc/kpageflags). This could serve at least as a detection mechanism, but one could also further use this info to logically map the pages that failed IO back to the original file offsets, and potentially retry IO just for those file ranges that cover the failed pages. Just an idea, not tested. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-03-31 16:13:09 </code></pre> On 31 March 2018 at 21:24, Anthony Iliopoulos wrote: <blockquote> On Fri, Mar 30, 2018 at 10:18:14AM +1300, Thomas Munro wrote: <blockquote> <blockquote> <blockquote> Yeah, I see why you want to PANIC. </blockquote> Indeed. Even doing that leaves question marks about all the kernel versions before v4.13, which at this point is pretty much everything out there, not even detecting this reliably. This is messy. </blockquote> </blockquote> There may still be a way to reliably detect this on older kernel versions from userspace, but it will be messy whatsoever. On EIO errors, the kernel will not restore the dirty page flags, but it will flip the error flags on the failed pages. One could mmap() the file in question, obtain the PFNs (via /proc/pid/pagemap) and enumerate those to match the ones with the error flag switched on (via /proc/kpageflags). This could serve at least as a detection mechanism, but one could also further use this info to logically map the pages that failed IO back to the original file offsets, and potentially retry IO just for those file ranges that cover the failed pages. Just an idea, not tested. </blockquote> That sounds like a huge amount of complexity, with uncertainty as to how it'll behave kernel-to-kernel, for negligble benefit. I was exploring the idea of doing selective recovery of one relfilenode, based on the assumption that we know the filenode related to the fd that failed to fsync(). We could redo only WAL on that relation. But it fails the same test: it's too complex for a niche case that shouldn't happen in the first place, so it'll probably have bugs, or grow bugs in bitrot over time. Remember, if you're on ext4 with errors=remount-ro, you get shut down even harder than a PANIC. So we should just use the big hammer here. I'll send a patch this week. <hr /> <pre><code>From:Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Date:2018-03-31 16:38:12 </code></pre> Craig Ringer writes: <blockquote> So we should just use the big hammer here. </blockquote> And bitch, loudly and publicly, about how broken this kernel behavior is. If we make enough of a stink maybe it'll get fixed. <hr /> <pre><code>From:Michael Paquier <michael(at)paquier(dot)xyz> Date:2018-04-01 00:20:38 </code></pre> On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> So we should just use the big hammer here. </blockquote> And bitch, loudly and publicly, about how broken this kernel behavior is. If we make enough of a stink maybe it'll get fixed. </blockquote> That won't fix anything released already, so as per the information gathered something has to be done anyway. The discussion of this thread is spreading quite a lot actually. Handling things at a low-level looks like a better plan for the backend. Tools like pg_basebackup and pg_dump also issue fsync's on the data created, we should do an equivalent for them, with some exit() calls in file_utils.c. As of now failures are logged to stderr but not considered fatal. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-01 00:58:22 </code></pre> On Sun, Apr 01, 2018 at 12:13:09AM +0800, Craig Ringer wrote: <blockquote> On 31 March 2018 at 21:24, Anthony Iliopoulos <[1]ailiop(at)altatus(dot)com> wrote: <pre><code> On Fri, Mar 30, 2018 at 10:18:14AM +1300, Thomas Munro wrote: > >> Yeah, I see why you want to PANIC. > > > > Indeed. Even doing that leaves question marks about all the kernel > > versions before v4.13, which at this point is pretty much everything > > out there, not even detecting this reliably. This is messy. </code></pre> <blockquote> <pre><code> There may still be a way to reliably detect this on older kernel versions from userspace, but it will be messy whatsoever. On EIO errors, the kernel will not restore the dirty page flags, but it will flip the error flags on the failed pages. One could mmap() the file in question, obtain the PFNs (via /proc/pid/pagemap) and enumerate those to match the ones with the error flag switched on (via /proc/kpageflags). This could serve at least as a detection mechanism, but one could also further use this info to logically map the pages that failed IO back to the original file offsets, and potentially retry IO just for those file ranges that cover the failed pages. Just an idea, not tested. </code></pre> </blockquote> That sounds like a huge amount of complexity, with uncertainty as to how it'll behave kernel-to-kernel, for negligble benefit. </blockquote> Those interfaces have been around since the kernel 2.6 times and are rather stable, but I was merely responding to your original post comment regarding having a way of finding out which page(s) failed. I assume that indeed there would be no benefit, especially since those errors are usually not transient (typically they come from hard medium faults), and although a filesystem could theoretically mask the error by allocating a different logical block, I am not aware of any implementation that currently does that. <blockquote> I was exploring the idea of doing selective recovery of one relfilenode, based on the assumption that we know the filenode related to the fd that failed to fsync(). We could redo only WAL on that relation. But it fails the same test: it's too complex for a niche case that shouldn't happen in the first place, so it'll probably have bugs, or grow bugs in bitrot over time. </blockquote> Fully agree, those cases should be sufficiently rare that a complex and possibly non-maintainable solution is not really warranted. <blockquote> Remember, if you're on ext4 with errors=remount-ro, you get shut down even harder than a PANIC. So we should just use the big hammer here. </blockquote> I am not entirely sure what you mean here, does Pg really treat write() errors as fatal? Also, the kind of errors that ext4 detects with this option is at the superblock level and govern metadata rather than actual data writes (recall that those are buffered anyway, no actual device IO has to take place at the time of write()). <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-01 01:14:46 </code></pre> On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> So we should just use the big hammer here. </blockquote> And bitch, loudly and publicly, about how broken this kernel behavior is. If we make enough of a stink maybe it'll get fixed. </blockquote> It is not likely to be fixed (beyond what has been done already with the manpage patches and errseq_t fixes on the reporting level). The issue is, the kernel needs to deal with hard IO errors at that level somehow, and since those errors typically persist, re-dirtying the pages would not really solve the problem (unless some filesystem remaps the request to a different block, assuming the device is alive). Keeping around dirty pages that cannot possibly be written out is essentially a memory leak, as those pages would stay around even after the application has exited. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-01 18:24:51 </code></pre> On Fri, Mar 30, 2018 at 10:18 AM, Thomas Munro wrote: <blockquote> ... on Linux only. </blockquote> Apparently I was too optimistic. I had looked only at FreeBSD, which keeps the page around and dirties it so we can retry, but the other BSDs apparently don't (FreeBSD changed that in 1999). From what I can tell from the sources below, we have: <pre><code>Linux, OpenBSD, NetBSD: retrying fsync() after EIO lies FreeBSD, Illumos: retrying fsync() after EIO tells the truth </code></pre> Maybe my drive-by assessment of those kernel routines is wrong and someone will correct me, but I'm starting to think you might be better to assume the worst on all systems. Perhaps a GUC that defaults to panicking, so that users on those rare OSes could turn that off? Even then I'm not sure if the failure mode will be that great anyway or if it's worth having two behaviours. Thoughts? <a href="http://mail-index.netbsd.org/netbsd-users/2018/03/30/msg020576.html">http://mail-index.netbsd.org/netbsd-users/2018/03/30/msg020576.html</a> <a href="https://github.com/NetBSD/src/blob/trunk/sys/kern/vfs_bio.c#L1059">https://github.com/NetBSD/src/blob/trunk/sys/kern/vfs_bio.c#L1059</a> <a href="https://github.com/openbsd/src/blob/master/sys/kern/vfs_bio.c#L867">https://github.com/openbsd/src/blob/master/sys/kern/vfs_bio.c#L867</a> <a href="https://github.com/freebsd/freebsd/blob/master/sys/kern/vfs_bio.c#L2631">https://github.com/freebsd/freebsd/blob/master/sys/kern/vfs_bio.c#L2631</a> <a href="https://github.com/freebsd/freebsd/commit/e4e8fec98ae986357cdc208b04557dba55a59266">https://github.com/freebsd/freebsd/commit/e4e8fec98ae986357cdc208b04557dba55a59266</a> <a href="https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/os/bio.c#L441">https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/os/bio.c#L441</a> <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-02 15:03:42 </code></pre> On 2 April 2018 at 02:24, Thomas Munro wrote: <blockquote> Maybe my drive-by assessment of those kernel routines is wrong and someone will correct me, but I'm starting to think you might be better to assume the worst on all systems. Perhaps a GUC that defaults to panicking, so that users on those rare OSes could turn that off? Even then I'm not sure if the failure mode will be that great anyway or if it's worth having two behaviours. Thoughts? </blockquote> I see little benefit to not just PANICing unconditionally on EIO, really. It shouldn't happen, and if it does, we want to be pretty conservative and adopt a data-protective approach. I'm rather more worried by doing it on ENOSPC. Which looks like it might be necessary from what I recall finding in my test case + kernel code reading. I really don't want to respond to a possibly-transient ENOSPC by PANICing the whole server unnecessarily. BTW, the support team at 2ndQ is presently working on two separate issues where ENOSPC resulted in DB corruption, though neither of them involve logs of lost page writes. I'm planning on taking some time tomorrow to write a torture tester for Pg's ENOSPC handling and to verify ENOSPC handling in the test case I linked to in my original StackOverflow post. If this is just an EIO issue then I see no point doing anything other than PANICing unconditionally. If it's a concern for ENOSPC too, we should try harder to fail more nicely whenever we possibly can. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-02 18:13:46 </code></pre> Hi, On 2018-04-01 03:14:46 +0200, Anthony Iliopoulos wrote: <blockquote> On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> So we should just use the big hammer here. </blockquote> And bitch, loudly and publicly, about how broken this kernel behavior is. If we make enough of a stink maybe it'll get fixed. </blockquote> It is not likely to be fixed (beyond what has been done already with the manpage patches and errseq_t fixes on the reporting level). The issue is, the kernel needs to deal with hard IO errors at that level somehow, and since those errors typically persist, re-dirtying the pages would not really solve the problem (unless some filesystem remaps the request to a different block, assuming the device is alive). </blockquote> Throwing away the dirty pages and persisting the error seems a lot more reasonable. Then provide a fcntl (or whatever) extension that can clear the error status in the few cases that the application that wants to gracefully deal with the case. <blockquote> Keeping around dirty pages that cannot possibly be written out is essentially a memory leak, as those pages would stay around even after the application has exited. </blockquote> Why do dirty pages need to be kept around in the case of persistent errors? I don't think the lack of automatic recovery in that case is what anybody is complaining about. It's that the error goes away and there's no reasonable way to separate out such an error from some potential transient errors. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-02 18:53:20 </code></pre> On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote: <blockquote> Hi, On 2018-04-01 03:14:46 +0200, Anthony Iliopoulos wrote: <blockquote> On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote: <blockquote> Craig Ringer writes: <blockquote> So we should just use the big hammer here. </blockquote> And bitch, loudly and publicly, about how broken this kernel behavior is. If we make enough of a stink maybe it'll get fixed. </blockquote> It is not likely to be fixed (beyond what has been done already with the manpage patches and errseq_t fixes on the reporting level). The issue is, the kernel needs to deal with hard IO errors at that level somehow, and since those errors typically persist, re-dirtying the pages would not really solve the problem (unless some filesystem remaps the request to a different block, assuming the device is alive). </blockquote> Throwing away the dirty pages and persisting the error seems a lot more reasonable. Then provide a fcntl (or whatever) extension that can clear the error status in the few cases that the application that wants to gracefully deal with the case. </blockquote> Given precisely that the dirty pages which cannot been written-out are practically thrown away, the semantics of fsync() (after the 4.13 fixes) are essentially correct: the first call indicates that a writeback error indeed occurred, while subsequent calls have no reason to indicate an error (assuming no other errors occurred in the meantime). The error reporting is thus consistent with the intended semantics (which are sadly not properly documented). Repeated calls to fsync() simply do not imply that the kernel will retry to writeback the previously-failed pages, so the application needs to be aware of that. Persisting the error at the fsync() level would essentially mean moving application policy into the kernel. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-02 19:32:45 </code></pre> On 2018-04-02 20:53:20 +0200, Anthony Iliopoulos wrote: <blockquote> On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote: <blockquote> Throwing away the dirty pages and persisting the error seems a lot more reasonable. Then provide a fcntl (or whatever) extension that can clear the error status in the few cases that the application that wants to gracefully deal with the case. </blockquote> Given precisely that the dirty pages which cannot been written-out are practically thrown away, the semantics of fsync() (after the 4.13 fixes) are essentially correct: the first call indicates that a writeback error indeed occurred, while subsequent calls have no reason to indicate an error (assuming no other errors occurred in the meantime). </blockquote> Meh^2. "no reason" - except that there's absolutely no way to know what state the data is in. And that your application needs explicit handling of such failures. And that one FD might be used in a lots of different parts of the application, that fsyncs in one part of the application might be an ok failure, and in another not. Requiring explicit actions to acknowledge "we've thrown away your data for unknown reason" seems entirely reasonable. <blockquote> The error reporting is thus consistent with the intended semantics (which are sadly not properly documented). Repeated calls to fsync() simply do not imply that the kernel will retry to writeback the previously-failed pages, so the application needs to be aware of that. </blockquote> Which isn't what I've suggested. <blockquote> Persisting the error at the fsync() level would essentially mean moving application policy into the kernel. </blockquote> Meh. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-02 20:38:06 </code></pre> On Mon, Apr 02, 2018 at 12:32:45PM -0700, Andres Freund wrote: <blockquote> On 2018-04-02 20:53:20 +0200, Anthony Iliopoulos wrote: <blockquote> On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote: <blockquote> Throwing away the dirty pages and persisting the error seems a lot more reasonable. Then provide a fcntl (or whatever) extension that can clear the error status in the few cases that the application that wants to gracefully deal with the case. </blockquote> Given precisely that the dirty pages which cannot been written-out are practically thrown away, the semantics of fsync() (after the 4.13 fixes) are essentially correct: the first call indicates that a writeback error indeed occurred, while subsequent calls have no reason to indicate an error (assuming no other errors occurred in the meantime). </blockquote> Meh^2. "no reason" - except that there's absolutely no way to know what state the data is in. And that your application needs explicit handling of such failures. And that one FD might be used in a lots of different parts of the application, that fsyncs in one part of the application might be an ok failure, and in another not. Requiring explicit actions to acknowledge "we've thrown away your data for unknown reason" seems entirely reasonable. </blockquote> As long as fsync() indicates error on first invocation, the application is fully aware that between this point of time and the last call to fsync() data has been lost. Persisting this error any further does not change this or add any new info - on the contrary it adds confusion as subsequent write()s and fsync()s on other pages can succeed, but will be reported as failures. The application will need to deal with that first error irrespective of subsequent return codes from fsync(). Conceptually every fsync() invocation demarcates an epoch for which it reports potential errors, so the caller needs to take responsibility for that particular epoch. Callers that are not affected by the potential outcome of fsync() and do not react on errors, have no reason for calling it in the first place (and thus masking failure from subsequent callers that may indeed care). <hr /> <pre><code>From:Stephen Frost <sfrost(at)snowman(dot)net> Date:2018-04-02 20:58:08 </code></pre> Greetings, Anthony Iliopoulos (ailiop(at)altatus(dot)com) wrote: <blockquote> On Mon, Apr 02, 2018 at 12:32:45PM -0700, Andres Freund wrote: <blockquote> On 2018-04-02 20:53:20 +0200, Anthony Iliopoulos wrote: <blockquote> On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote: <blockquote> Throwing away the dirty pages and persisting the error seems a lot more reasonable. Then provide a fcntl (or whatever) extension that can clear the error status in the few cases that the application that wants to gracefully deal with the case. </blockquote> Given precisely that the dirty pages which cannot been written-out are practically thrown away, the semantics of fsync() (after the 4.13 fixes) are essentially correct: the first call indicates that a writeback error indeed occurred, while subsequent calls have no reason to indicate an error (assuming no other errors occurred in the meantime). </blockquote> Meh^2. "no reason" - except that there's absolutely no way to know what state the data is in. And that your application needs explicit handling of such failures. And that one FD might be used in a lots of different parts of the application, that fsyncs in one part of the application might be an ok failure, and in another not. Requiring explicit actions to acknowledge "we've thrown away your data for unknown reason" seems entirely reasonable. </blockquote> As long as fsync() indicates error on first invocation, the application is fully aware that between this point of time and the last call to fsync() data has been lost. Persisting this error any further does not change this or add any new info - on the contrary it adds confusion as subsequent write()s and fsync()s on other pages can succeed, but will be reported as failures. </blockquote> fsync() doesn't reflect the status of given pages, however, it reflects the status of the file descriptor, and as such the file, on which it's called. This notion that fsync() is actually only responsible for the changes which were made to a file since the last fsync() call is pure foolishness. If we were able to pass a list of pages or data ranges to fsync() for it to verify they're on disk then perhaps things would be different, but we can't, all we can do is ask to "please flush all the dirty pages associated with this file descriptor, which represents this file we opened, to disk, and let us know if you were successful." Give us a way to ask "are these specific pages written out to persistant storage?" and we would certainly be happy to use it, and to repeatedly try to flush out pages which weren't synced to disk due to some transient error, and to track those cases and make sure that we don't incorrectly assume that they've been transferred to persistent storage. <blockquote> The application will need to deal with that first error irrespective of subsequent return codes from fsync(). Conceptually every fsync() invocation demarcates an epoch for which it reports potential errors, so the caller needs to take responsibility for that particular epoch. </blockquote> We do deal with that error- by realizing that it failed and later retrying the fsync(), which is when we get back an "all good! everything with this file descriptor you've opened is sync'd!" and happily expect that to be truth, when, in reality, it's an unfortunate lie and there are still pages associated with that file descriptor which are, in reality, dirty and not sync'd to disk. Consider two independent programs where the first one writes to a file and then calls the second one whose job it is to go out and fsync(), perhaps async from the first, those files. Is the second program supposed to go write to each page that the first one wrote to, in order to ensure that all the dirty bits are set so that the fsync() will actually return if all the dirty pages are written? <blockquote> Callers that are not affected by the potential outcome of fsync() and do not react on errors, have no reason for calling it in the first place (and thus masking failure from subsequent callers that may indeed care). </blockquote> Reacting on an error from an fsync() call could, based on how it's documented and actually implemented in other OS's, mean "run another fsync() to see if the error has resolved itself." Requiring that to mean "you have to go dirty all of the pages you previously dirtied to actually get a subsequent fsync() to do anything" is really just not reasonable- a given program may have no idea what was written to previously nor any particular reason to need to know, on the expectation that the fsync() call will flush any dirty pages, as it's documented to do. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-02 23:05:44 </code></pre> Hi Stephen, On Mon, Apr 02, 2018 at 04:58:08PM -0400, Stephen Frost wrote: <blockquote> fsync() doesn't reflect the status of given pages, however, it reflects the status of the file descriptor, and as such the file, on which it's called. This notion that fsync() is actually only responsible for the changes which were made to a file since the last fsync() call is pure foolishness. If we were able to pass a list of pages or data ranges to fsync() for it to verify they're on disk then perhaps things would be different, but we can't, all we can do is ask to "please flush all the dirty pages associated with this file descriptor, which represents this file we opened, to disk, and let us know if you were successful." Give us a way to ask "are these specific pages written out to persistant storage?" and we would certainly be happy to use it, and to repeatedly try to flush out pages which weren't synced to disk due to some transient error, and to track those cases and make sure that we don't incorrectly assume that they've been transferred to persistent storage. </blockquote> Indeed fsync() is simply a rather blunt instrument and a narrow legacy interface but further changing its established semantics (no matter how unreasonable they may be) is probably not the way to go. Would using sync_file_range() be helpful? Potential errors would only apply to pages that cover the requested file ranges. There are a few caveats though: (a) it still messes with the top-level error reporting so mixing it with callers that use fsync() and do care about errors will produce the same issue (clearing the error status). (b) the error-reporting granularity is coarse (failure reporting applies to the entire requested range so you still don't know which particular pages/file sub-ranges failed writeback) (c) the same "report and forget" semantics apply to repeated invocations of the sync_file_range() call, so again action will need to be taken upon first error encountered for the particular ranges. <blockquote> <blockquote> The application will need to deal with that first error irrespective of subsequent return codes from fsync(). Conceptually every fsync() invocation demarcates an epoch for which it reports potential errors, so the caller needs to take responsibility for that particular epoch. </blockquote> We do deal with that error- by realizing that it failed and later retrying the fsync(), which is when we get back an "all good! everything with this file descriptor you've opened is sync'd!" and happily expect that to be truth, when, in reality, it's an unfortunate lie and there are still pages associated with that file descriptor which are, in reality, dirty and not sync'd to disk. </blockquote> It really turns out that this is not how the fsync() semantics work though, exactly because the nature of the errors: even if the kernel retained the dirty bits on the failed pages, retrying persisting them on the same disk location would simply fail. Instead the kernel opts for marking those pages clean (since there is no other recovery strategy), and reporting once to the caller who can potentially deal with it in some manner. It is sadly a bad and undocumented convention. <blockquote> Consider two independent programs where the first one writes to a file and then calls the second one whose job it is to go out and fsync(), perhaps async from the first, those files. Is the second program supposed to go write to each page that the first one wrote to, in order to ensure that all the dirty bits are set so that the fsync() will actually return if all the dirty pages are written? </blockquote> I think what you have in mind are the semantics of sync() rather than fsync(), but as long as an application needs to ensure data are persisted to storage, it needs to retain those data in its heap until fsync() is successful instead of discarding them and relying on the kernel after write(). The pattern should be roughly like: write() -> fsync() -> free(), rather than write() -> free() -> fsync(). For example, if a partition gets full upon fsync(), then the application has a chance to persist the data in a different location, while the kernel cannot possibly make this decision and recover. <blockquote> <blockquote> Callers that are not affected by the potential outcome of fsync() and do not react on errors, have no reason for calling it in the first place (and thus masking failure from subsequent callers that may indeed care). </blockquote> Reacting on an error from an fsync() call could, based on how it's documented and actually implemented in other OS's, mean "run another fsync() to see if the error has resolved itself." Requiring that to mean "you have to go dirty all of the pages you previously dirtied to actually get a subsequent fsync() to do anything" is really just not reasonable- a given program may have no idea what was written to previously nor any particular reason to need to know, on the expectation that the fsync() call will flush any dirty pages, as it's documented to do. </blockquote> I think we are conflating a few issues here: having the OS kernel being responsible for error recovery (so that subsequent fsync() would fix the problems) is one. This clearly is a design which most kernels have not really adopted for reasons outlined above (although having the FS layer recovering from hard errors transparently is open for discussion from what it seems [1]). Now, there is the issue of granularity of error reporting: userspace could benefit from a fine-grained indication of failed pages (or file ranges). Another issue is that of reporting semantics (report and clear), which is also a design choice made to avoid having higher-resolution error tracking and the corresponding memory overheads [1]. [1] <a href="https://lwn.net/Articles/718734/">https://lwn.net/Articles/718734/</a> <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-02 23:23:24 </code></pre> On 2018-04-03 01:05:44 +0200, Anthony Iliopoulos wrote: <blockquote> Would using sync_file_range() be helpful? Potential errors would only apply to pages that cover the requested file ranges. There are a few caveats though: </blockquote> To quote sync_file_range(2): <pre><code> Warning This system call is extremely dangerous and should not be used in portable programs. None of these operations writes out the file's metadata. Therefore, unless the application is strictly performing overwrites of already-instantiated disk blocks, there are no guarantees that the data will be available after a crash. There is no user interface to know if a write is purely an over‐ write. On filesystems using copy-on-write semantics (e.g., btrfs) an overwrite of existing allocated blocks is impossible. When writing into preallocated space, many filesystems also require calls into the block allocator, which this system call does not sync out to disk. This system call does not flush disk write caches and thus does not provide any data integrity on systems with volatile disk write caches. </code></pre> Given the lack of metadata safety that seems entirely a no go. We use sfr(2), but only to force the kernel's hand around writing back earlier without throwing away cache contents. <blockquote> <blockquote> <blockquote> The application will need to deal with that first error irrespective of subsequent return codes from fsync(). Conceptually every fsync() invocation demarcates an epoch for which it reports potential errors, so the caller needs to take responsibility for that particular epoch. </blockquote> We do deal with that error- by realizing that it failed and later retrying the fsync(), which is when we get back an "all good! everything with this file descriptor you've opened is sync'd!" and happily expect that to be truth, when, in reality, it's an unfortunate lie and there are still pages associated with that file descriptor which are, in reality, dirty and not sync'd to disk. </blockquote> It really turns out that this is not how the fsync() semantics work though </blockquote> Except on freebsd and solaris, and perhaps others. <blockquote> , exactly because the nature of the errors: even if the kernel retained the dirty bits on the failed pages, retrying persisting them on the same disk location would simply fail. </blockquote> That's not guaranteed at all, think NFS. <blockquote> Instead the kernel opts for marking those pages clean (since there is no other recovery strategy), and reporting once to the caller who can potentially deal with it in some manner. It is sadly a bad and undocumented convention. </blockquote> It's broken behaviour justified post facto with the only rational that was available, which explains why it's so unconvincing. You could just say "this ship has sailed, and it's to onerous to change because xxx" and this'd be a done deal. But claiming this is reasonable behaviour is ridiculous. Again, you could just continue to error for this fd and still throw away the data. <blockquote> <blockquote> Consider two independent programs where the first one writes to a file and then calls the second one whose job it is to go out and fsync(), perhaps async from the first, those files. Is the second program supposed to go write to each page that the first one wrote to, in order to ensure that all the dirty bits are set so that the fsync() will actually return if all the dirty pages are written? </blockquote> I think what you have in mind are the semantics of sync() rather than fsync() </blockquote> If you open the same file with two fds, and write with one, and fsync with another that's definitely supposed to work. And sync() isn't a realistic replacement in any sort of way because it's obviously systemwide, and thus entirely and completely unsuitable. Nor does it have any sort of better error reporting behaviour, does it? <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-02 23:27:35 </code></pre> On 3 April 2018 at 07:05, Anthony Iliopoulos wrote: <blockquote> Hi Stephen, On Mon, Apr 02, 2018 at 04:58:08PM -0400, Stephen Frost wrote: <blockquote> fsync() doesn't reflect the status of given pages, however, it reflects the status of the file descriptor, and as such the file, on which it's called. This notion that fsync() is actually only responsible for the changes which were made to a file since the last fsync() call is pure foolishness. If we were able to pass a list of pages or data ranges to fsync() for it to verify they're on disk then perhaps things would be different, but we can't, all we can do is ask to "please flush all the dirty pages associated with this file descriptor, which represents this file we opened, to disk, and let us know if you were successful." Give us a way to ask "are these specific pages written out to persistant storage?" and we would certainly be happy to use it, and to repeatedly try to flush out pages which weren't synced to disk due to some transient error, and to track those cases and make sure that we don't incorrectly assume that they've been transferred to persistent storage. </blockquote> Indeed fsync() is simply a rather blunt instrument and a narrow legacy interface but further changing its established semantics (no matter how unreasonable they may be) is probably not the way to go. </blockquote> They're undocumented and extremely surprising semantics that are arguably a violation of the POSIX spec for fsync(), or at least a surprising interpretation of it. So I don't buy this argument. <blockquote> It really turns out that this is not how the fsync() semantics work though, exactly because the nature of the errors: even if the kernel retained the dirty bits on the failed pages, retrying persisting them on the same disk location would simply fail. </blockquote> might simply fail. It depends on why the error ocurred. I originally identified this behaviour on a multipath system. Multipath defaults to "throw the writes away, nobody really cares anyway" on error. It seems to figure a higher level will retry, or the application will receive the error and retry. (See no_path_retry in multipath config. AFAICS the default is insanely dangerous and only suitable for specialist apps that understand the quirks; you should use no_path_retry=queue). <blockquote> Instead the kernel opts for marking those pages clean (since there is no other recovery strategy), and reporting once to the caller who can potentially deal with it in some manner. It is sadly a bad and undocumented convention. </blockquote> It could mark the FD. It's not just undocumented, it's a slightly creative interpretation of the POSIX spec for fsync. <blockquote> <blockquote> Consider two independent programs where the first one writes to a file and then calls the second one whose job it is to go out and fsync(), perhaps async from the first, those files. Is the second program supposed to go write to each page that the first one wrote to, in order to ensure that all the dirty bits are set so that the fsync() will actually return if all the dirty pages are written? </blockquote> I think what you have in mind are the semantics of sync() rather than fsync(), but as long as an application needs to ensure data are persisted to storage, it needs to retain those data in its heap until fsync() is successful instead of discarding them and relying on the kernel after write(). </blockquote> This is almost exactly what we tell application authors using PostgreSQL: the data isn't written until you receive a successful commit confirmation, so you'd better not forget it. We provide applications with clear boundaries so they can know exactly what was, and was not, written. I guess the argument from the kernel is the same is true: whatever was written since the last successful fsync is potentially lost and must be redone. But the fsync behaviour is utterly undocumented and dubiously standard. <blockquote> I think we are conflating a few issues here: having the OS kernel being responsible for error recovery (so that subsequent fsync() would fix the problems) is one. This clearly is a design which most kernels have not really adopted for reasons outlined above </blockquote> [citation needed] What do other major platforms do here? The post above suggests it's a bit of a mix of behaviours. <blockquote> Now, there is the issue of granularity of error reporting: userspace could benefit from a fine-grained indication of failed pages (or file ranges). </blockquote> Yep. I looked at AIO in the hopes that, if we used AIO, we'd be able to map a sync failure back to an individual AIO write. But it seems AIO just adds more problems and fixes none. Flush behaviour with AIO from what I can tell is inconsistent version to version and generally unhelpful. The kernel should really report such sync failures back to the app on its AIO write mapping, but it seems nothing of the sort happens. <hr /> <pre><code>From:Christophe Pettus <xof(at)thebuild(dot)com> Date:2018-04-03 00:03:39 </code></pre> <blockquote> On Apr 2, 2018, at 16:27, Craig Ringer wrote: They're undocumented and extremely surprising semantics that are arguably a violation of the POSIX spec for fsync(), or at least a surprising interpretation of it. </blockquote> Even accepting that (I personally go with surprising over violation, as if my vote counted), it is highly unlikely that we will convince every kernel team to declare "What fools we've been!" and push a change... and even if they did, PostgreSQL can look forward to many years of running on kernels with the broken semantics. Given that, I think the PANIC option is the soundest one, as unappetizing as it is. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-03 00:05:09 </code></pre> On April 2, 2018 5:03:39 PM PDT, Christophe Pettus wrote: <blockquote> <blockquote> On Apr 2, 2018, at 16:27, Craig Ringer wrote: They're undocumented and extremely surprising semantics that are arguably a violation of the POSIX spec for fsync(), or at least a surprising interpretation of it. </blockquote> Even accepting that (I personally go with surprising over violation, as if my vote counted), it is highly unlikely that we will convince every kernel team to declare "What fools we've been!" and push a change... and even if they did, PostgreSQL can look forward to many years of running on kernels with the broken semantics. Given that, I think the PANIC option is the soundest one, as unappetizing as it is. </blockquote> Don't we pretty much already have agreement in that? And Craig is the main proponent of it? <hr /> <pre><code>From:Christophe Pettus <xof(at)thebuild(dot)com> Date:2018-04-03 00:07:41 </code></pre> <blockquote> On Apr 2, 2018, at 17:05, Andres Freund wrote: Don't we pretty much already have agreement in that? And Craig is the main proponent of it? </blockquote> For sure on the second sentence; the first was not clear to me. <hr /> <pre><code>From:Peter Geoghegan <pg(at)bowt(dot)ie> Date:2018-04-03 00:48:00 </code></pre> On Mon, Apr 2, 2018 at 5:05 PM, Andres Freund wrote: <blockquote> <blockquote> Even accepting that (I personally go with surprising over violation, as if my vote counted), it is highly unlikely that we will convince every kernel team to declare "What fools we've been!" and push a change... and even if they did, PostgreSQL can look forward to many years of running on kernels with the broken semantics. Given that, I think the PANIC option is the soundest one, as unappetizing as it is. </blockquote> Don't we pretty much already have agreement in that? And Craig is the main proponent of it? </blockquote> I wonder how bad it will be in practice if we PANIC. Craig said "This isn't as bad as it seems because AFAICS fsync only returns EIO in cases where we should be stopping the world anyway, and many FSes will do that for us". It would be nice to get more information on that. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-03 01:29:28 </code></pre> On Tue, Apr 3, 2018 at 3:03 AM, Craig Ringer wrote: <blockquote> I see little benefit to not just PANICing unconditionally on EIO, really. It shouldn't happen, and if it does, we want to be pretty conservative and adopt a data-protective approach. I'm rather more worried by doing it on ENOSPC. Which looks like it might be necessary from what I recall finding in my test case + kernel code reading. I really don't want to respond to a possibly-transient ENOSPC by PANICing the whole server unnecessarily. </blockquote> Yeah, it'd be nice to give an administrator the chance to free up some disk space after ENOSPC is reported, and stay up. Running out of space really shouldn't take down the database without warning! The question is whether the data remains in cache and marked dirty, so that retrying is a safe option (since it's potentially gone from our own buffers, so if the OS doesn't have it the only place your committed data can definitely still be found is the WAL... recovery time). Who can tell us? Do we need a per-filesystem answer? Delayed allocation is a somewhat filesystem-specific thing, so maybe. Interestingly, there don't seem to be many operating systems that can report ENOSPC from fsync(), based on a quick scan through some documentation: <pre><code>POSIX, AIX, HP-UX, FreeBSD, OpenBSD, NetBSD: no Illumos/Solaris, Linux, macOS: yes </code></pre> I don't know if macOS really means it or not; it just tells you to see the errors for read(2) and write(2). By the way, speaking of macOS, I was curious to see if the common BSD heritage would show here. Yeah, somewhat. It doesn't appear to keep buffers on writeback error, if this is the right code<a href="http://danluu.com/though it could be handling it somewhereelse for all I know">1</a>. [1] <a href="https://github.com/apple/darwin-xnu/blob/master/bsd/vfs/vfs_bio.c#L2695">https://github.com/apple/darwin-xnu/blob/master/bsd/vfs/vfs_bio.c#L2695</a> <hr /> <pre><code>From:Robert Haas <robertmhaas(at)gmail(dot)com> Date:2018-04-03 02:54:26 </code></pre> On Mon, Apr 2, 2018 at 2:53 PM, Anthony Iliopoulos wrote: <blockquote> Given precisely that the dirty pages which cannot been written-out are practically thrown away, the semantics of fsync() (after the 4.13 fixes) are essentially correct: the first call indicates that a writeback error indeed occurred, while subsequent calls have no reason to indicate an error (assuming no other errors occurred in the meantime). </blockquote> Like other people here, I think this is 100% unreasonable, starting with "the dirty pages which cannot been written out are practically thrown away". Who decided that was OK, and on the basis of what wording in what specification? I think it's always unreasonable to throw away the user's data. If the writes are going to fail, then let them keep on failing every time. That wouldn't cause any data loss, because we'd never be able to checkpoint, and eventually the user would have to kill the server uncleanly, and that would trigger recovery. Also, this really does make it impossible to write reliable programs. Imagine that, while the server is running, somebody runs a program which opens a file in the data directory, calls fsync() on it, and closes it. If the fsync() fails, postgres is now borked and has no way of being aware of the problem. If we knew, we could PANIC, but we'll never find out, because the unrelated process ate the error. This is exactly the sort of ill-considered behavior that makes fcntl() locking nearly useless. Even leaving that aside, a PANIC means a prolonged outage on a prolonged system - it could easily take tens of minutes or longer to run recovery. So saying "oh, just do that" is not really an answer. Sure, we can do it, but it's like trying to lose weight by intentionally eating a tapeworm. Now, it's possible to shorten the checkpoint_timeout so that recovery runs faster, but then performance drops because data has to be fsync()'d more often instead of getting buffered in the OS cache for the maximum possible time. We could also dodge this issue in another way: suppose that when we write a page out, we don't consider it really written until fsync() succeeds. Then we wouldn't need to PANIC if an fsync() fails; we could just re-write the page. Unfortunately, this would also be terrible for performance, for pretty much the same reasons: letting the OS cache absorb lots of dirty blocks and do write-combining is necessary for good performance. <blockquote> The error reporting is thus consistent with the intended semantics (which are sadly not properly documented). Repeated calls to fsync() simply do not imply that the kernel will retry to writeback the previously-failed pages, so the application needs to be aware of that. Persisting the error at the fsync() level would essentially mean moving application policy into the kernel. </blockquote> I might accept this argument if I accepted that it was OK to decide that an fsync() failure means you can forget that the write() ever happened in the first place, but it's hard to imagine an application that wants that behavior. If the application didn't care about whether the bytes really got to disk or not, it would not have called fsync() in the first place. If it does care, reporting the error only once is never an improvement. <hr /> <pre><code>From:Peter Geoghegan <pg(at)bowt(dot)ie> Date:2018-04-03 03:45:30 </code></pre> On Mon, Apr 2, 2018 at 7:54 PM, Robert Haas wrote: <blockquote> Also, this really does make it impossible to write reliable programs. Imagine that, while the server is running, somebody runs a program which opens a file in the data directory, calls fsync() on it, and closes it. If the fsync() fails, postgres is now borked and has no way of being aware of the problem. If we knew, we could PANIC, but we'll never find out, because the unrelated process ate the error. This is exactly the sort of ill-considered behavior that makes fcntl() locking nearly useless. </blockquote> I fear that the conventional wisdom from the Kernel people is now "you should be using O_DIRECT for granular control". The LWN article Thomas linked (<a href="https://lwn.net/Articles/718734">https://lwn.net/Articles/718734</a>) cites Ted Ts'o: "Monakhov asked why a counter was needed; Layton said it was to handle multiple overlapping writebacks. Effectively, the counter would record whether a writeback had failed since the file was opened or since the last fsync(). Ts'o said that should be fine; applications that want more information should use O_DIRECT. For most applications, knowledge that an error occurred somewhere in the file is all that is necessary; applications that require better granularity already use O_DIRECT." <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-03 10:35:39 </code></pre> Hi Robert, On Mon, Apr 02, 2018 at 10:54:26PM -0400, Robert Haas wrote: <blockquote> On Mon, Apr 2, 2018 at 2:53 PM, Anthony Iliopoulos wrote: <blockquote> Given precisely that the dirty pages which cannot been written-out are practically thrown away, the semantics of fsync() (after the 4.13 fixes) are essentially correct: the first call indicates that a writeback error indeed occurred, while subsequent calls have no reason to indicate an error (assuming no other errors occurred in the meantime). </blockquote> Like other people here, I think this is 100% unreasonable, starting with "the dirty pages which cannot been written out are practically thrown away". Who decided that was OK, and on the basis of what wording in what specification? I think it's always unreasonable to </blockquote> If you insist on strict conformance to POSIX, indeed the linux glibc configuration and associated manpage are probably wrong in stating that _POSIX_SYNCHRONIZED_IO is supported. The implementation matches that of the flexibility allowed by not supporting SIO. There's a long history of brokenness between linux and posix, and I think there was never an intention of conforming to the standard. <blockquote> throw away the user's data. If the writes are going to fail, then let them keep on failing every time. That wouldn't cause any data loss, because we'd never be able to checkpoint, and eventually the user would have to kill the server uncleanly, and that would trigger recovery. </blockquote> I believe (as tried to explain earlier) there is a certain assumption being made that the writer and original owner of data is responsible for dealing with potential errors in order to avoid data loss (which should be only of interest to the original writer anyway). It would be very questionable for the interface to persist the error while subsequent writes and fsyncs to different offsets may as well go through. Another process may need to write into the file and fsync, while being unaware of those newly introduced semantics is now faced with EIO because some unrelated previous process failed some earlier writes and did not bother to clear the error for those writes. In a similar scenario where the second process is aware of the new semantics, it would naturally go ahead and clear the global error in order to proceed with its own write()+fsync(), which would essentially amount to the same problematic semantics you have now. <blockquote> Also, this really does make it impossible to write reliable programs. Imagine that, while the server is running, somebody runs a program which opens a file in the data directory, calls fsync() on it, and closes it. If the fsync() fails, postgres is now borked and has no way of being aware of the problem. If we knew, we could PANIC, but we'll never find out, because the unrelated process ate the error. This is exactly the sort of ill-considered behavior that makes fcntl() locking nearly useless. </blockquote> Fully agree, and the errseq_t fixes have dealt exactly with the issue of making sure that the error is reported to all file descriptors that happen to be open at the time of error. But I think one would have a hard time defending a modification to the kernel where this is further extended to cover cases where: process A does write() on some file offset which fails writeback, fsync() gets EIO and exit()s. process B does write() on some other offset which succeeds writeback, but fsync() gets EIO due to (uncleared) failures of earlier process. This would be a highly user-visible change of semantics from edge- triggered to level-triggered behavior. <blockquote> dodge this issue in another way: suppose that when we write a page out, we don't consider it really written until fsync() succeeds. Then </blockquote> That's the only way to think about fsync() guarantees unless you are on a kernel that keeps retrying to persist dirty pages. Assuming such a model, after repeated and unrecoverable hard failures the process would have to explicitly inform the kernel to drop the dirty pages. All the process could do at that point is read back to userspace the dirty/failed pages and attempt to rewrite them at a different place (which is current possible too). Most applications would not bother though to inform the kernel and drop the permanently failed pages; and thus someone eventually would hit the case that a large amount of failed writeback pages are running his server out of memory, at which point people will complain that those semantics are completely unreasonable. <blockquote> we wouldn't need to PANIC if an fsync() fails; we could just re-write the page. Unfortunately, this would also be terrible for performance, for pretty much the same reasons: letting the OS cache absorb lots of dirty blocks and do write-combining is necessary for good performance. </blockquote> Not sure I understand this case. The application may indeed re-write a bunch of pages that have failed and proceed with fsync(). The kernel will deal with combining the writeback of all the re-written pages. But further the necessity of combining for performance really depends on the exact storage medium. At the point you start caring about write-combining, the kernel community will naturally redirect you to use DIRECT_IO. <blockquote> <blockquote> The error reporting is thus consistent with the intended semantics (which are sadly not properly documented). Repeated calls to fsync() simply do not imply that the kernel will retry to writeback the previously-failed pages, so the application needs to be aware of that. Persisting the error at the fsync() level would essentially mean moving application policy into the kernel. </blockquote> I might accept this argument if I accepted that it was OK to decide that an fsync() failure means you can forget that the write() ever happened in the first place, but it's hard to imagine an application that wants that behavior. If the application didn't care about whether the bytes really got to disk or not, it would not have called fsync() in the first place. If it does care, reporting the error only once is never an improvement. </blockquote> Again, conflating two separate issues, that of buffering and retrying failed pages and that of error reporting. Yes it would be convenient for applications not to have to care at all about recovery of failed write-backs, but at some point they would have to face this issue one way or another (I am assuming we are always talking about hard failures, other kinds of failures are probably already being dealt with transparently at the kernel level). As for the reporting, it is also unreasonable to effectively signal and persist an error on a file-wide granularity while it pertains to subsets of that file and other writes can go through, but I am repeating myself. I suppose that if the check-and-clear semantics are problematic for Pg, one could suggest a kernel patch that opts-in to a level-triggered reporting of fsync() on a per-descriptor basis, which seems to be non-intrusive and probably sufficient to cover your expected use-case. <hr /> <pre><code>From:Greg Stark <stark(at)mit(dot)edu> Date:2018-04-03 11:26:05 </code></pre> On 3 April 2018 at 11:35, Anthony Iliopoulos wrote: <blockquote> Hi Robert, Fully agree, and the errseq_t fixes have dealt exactly with the issue of making sure that the error is reported to all file descriptors that happen to be open at the time of error. But I think one would have a hard time defending a modification to the kernel where this is further extended to cover cases where: process A does write() on some file offset which fails writeback, fsync() gets EIO and exit()s. process B does write() on some other offset which succeeds writeback, but fsync() gets EIO due to (uncleared) failures of earlier process. </blockquote> Surely that's exactly what process B would want? If it calls fsync and gets a success and later finds out that the file is corrupt and didn't match what was in memory it's not going to be happy. This seems like an attempt to co-opt fsync for a new and different purpose for which it's poorly designed. It's not an async error reporting mechanism for writes. It would be useless as that as any process could come along and open your file and eat the errors for writes you performed. An async error reporting mechanism would have to document which writes it was giving errors for and give you ways to control that. The semantics described here are useless for everyone. For a program needing to know the error status of the writes it executed, it doesn't know which writes are included in which fsync call. For a program using fsync for its original intended purpose of guaranteeing that the all writes are synced to disk it no longer has any guarantee at all. <blockquote> This would be a highly user-visible change of semantics from edge- triggered to level-triggered behavior. </blockquote> It was always documented as level-triggered. This edge-triggered concept is a completely surprise to application writers. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-03 13:36:47 </code></pre> On Tue, Apr 03, 2018 at 12:26:05PM +0100, Greg Stark wrote: <blockquote> On 3 April 2018 at 11:35, Anthony Iliopoulos wrote: <blockquote> Hi Robert, Fully agree, and the errseq_t fixes have dealt exactly with the issue of making sure that the error is reported to all file descriptors that happen to be open at the time of error. But I think one would have a hard time defending a modification to the kernel where this is further extended to cover cases where: process A does write() on some file offset which fails writeback, fsync() gets EIO and exit()s. process B does write() on some other offset which succeeds writeback, but fsync() gets EIO due to (uncleared) failures of earlier process. </blockquote> Surely that's exactly what process B would want? If it calls fsync and gets a success and later finds out that the file is corrupt and didn't match what was in memory it's not going to be happy. </blockquote> You can't possibly make this assumption. Process B may be reading and writing to completely disjoint regions from those of process A, and as such not really caring about earlier failures, only wanting to ensure its own writes go all the way through. But even if it did care, the file interfaces make no transactional guarantees. Even without fsync() there is nothing preventing process B from reading dirty pages from process A, and based on their content proceed to to its own business and write/persist new data, while process A further modifies the not-yet-flushed pages in-memory before flushing. In this case you'd need explicit synchronization/locking between the processes anyway, so why would fsync() be an exception? <blockquote> This seems like an attempt to co-opt fsync for a new and different purpose for which it's poorly designed. It's not an async error reporting mechanism for writes. It would be useless as that as any process could come along and open your file and eat the errors for writes you performed. An async error reporting mechanism would have to document which writes it was giving errors for and give you ways to control that. </blockquote> The errseq_t fixes deal with that; errors will be reported to any process that has an open fd, irrespective to who is the actual caller of the fsync() that may have induced errors. This is anyway required as the kernel may evict dirty pages on its own by doing writeback and as such there needs to be a way to report errors on all open fds. <blockquote> The semantics described here are useless for everyone. For a program needing to know the error status of the writes it executed, it doesn't know which writes are included in which fsync call. For a program </blockquote> If EIO persists between invocations until explicitly cleared, a process cannot possibly make any decision as to if it should clear the error and proceed or some other process will need to leverage that without coordination, or which writes actually failed for that matter. We would be back to the case of requiring explicit synchronization between processes that care about this, in which case the processes may as well synchronize over calling fsync() in the first place. Having an opt-in persisting EIO per-fd would practically be a form of "contract" between "cooperating" processes anyway. But instead of deconstructing and debating the semantics of the current mechanism, why not come up with the ideal desired form of error reporting/tracking granularity etc., and see how this may be fitted into kernels as a new interface. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-03 14:29:10 </code></pre> On 3 April 2018 at 10:54, Robert Haas wrote: <blockquote> I think it's always unreasonable to throw away the user's data. </blockquote> Well, we do that. If a txn aborts, all writes in the txn are discarded. I think that's perfectly reasonable. Though we also promise an all or nothing effect, we make exceptions even there. The FS doesn't offer transactional semantics, but the fsync behaviour can be interpreted kind of similarly. I don't agree with it, but I don't think it's as wholly unreasonable as all that. I think leaving it undocumented is absolutely gobsmacking, and it's dubious at best, but it's not totally insane. <blockquote> If the writes are going to fail, then let them keep on failing every time. </blockquote> Like we do, where we require an explicit rollback. But POSIX may pose issues there, it doesn't really define any interface for that AFAIK. Unless you expect the app to close() and re-open() the file. Replacing one nonstandard issue with another may not be a win. <blockquote> That wouldn't cause any data loss, because we'd never be able to checkpoint, and eventually the user would have to kill the server uncleanly, and that would trigger recovery. </blockquote> Yep. That's what I expected to happen on unrecoverable I/O errors. Because, y'know, unrecoverable. I was stunned to learn it's not so. And I'm even more amazed to learn that ext4's errors=remount-ro apparently doesn't concern its self with mere user data, and may exhibit the same behaviour - I need to rerun my test case on it tomorrow. <blockquote> Also, this really does make it impossible to write reliable programs. </blockquote> In the presence of multiple apps interacting on the same file, yes. I think that's a little bit of a stretch though. For a single app, you can recover by remembering and redoing all the writes you did. Sucks if your app wants to have multiple processes working together on a file without some kind of journal or WAL, relying on fsync() alone, mind you. But at least we have WAL. Hrm. I wonder how this interacts with wal_level=minimal. <blockquote> Even leaving that aside, a PANIC means a prolonged outage on a prolonged system - it could easily take tens of minutes or longer to run recovery. So saying "oh, just do that" is not really an answer. Sure, we can do it, but it's like trying to lose weight by intentionally eating a tapeworm. Now, it's possible to shorten the checkpoint_timeout so that recovery runs faster, but then performance drops because data has to be fsync()'d more often instead of getting buffered in the OS cache for the maximum possible time. </blockquote> It's also spikier. Users have more issues with latency with short, frequent checkpoints. <blockquote> We could also dodge this issue in another way: suppose that when we write a page out, we don't consider it really written until fsync() succeeds. Then we wouldn't need to PANIC if an fsync() fails; we could just re-write the page. Unfortunately, this would also be terrible for performance, for pretty much the same reasons: letting the OS cache absorb lots of dirty blocks and do write-combining is necessary for good performance. </blockquote> Our double-caching is already plenty bad enough anyway, as well. (Ideally I want to be able to swap buffers between shared_buffers and the OS buffer-cache. Almost like a 2nd level of buffer pinning. When we write out a block, we transfer ownership to the OS. Yeah, I'm dreaming. But we'd sure need to be able to trust the OS not to just forget the block then!) <blockquote> <blockquote> The error reporting is thus consistent with the intended semantics (which are sadly not properly documented). Repeated calls to fsync() simply do not imply that the kernel will retry to writeback the previously-failed pages, so the application needs to be aware of that. Persisting the error at the fsync() level would essentially mean moving application policy into the kernel. </blockquote> I might accept this argument if I accepted that it was OK to decide that an fsync() failure means you can forget that the write() ever happened in the first place, but it's hard to imagine an application that wants that behavior. If the application didn't care about whether the bytes really got to disk or not, it would not have called fsync() in the first place. If it does care, reporting the error only once is never an improvement. </blockquote> Many RDBMSes do just that. It's hardly behaviour unique to the kernel. They report an ERROR on a statement in a txn then go on with life, merrily forgetting that anything was ever wrong. I agree with PostgreSQL's stance that this is wrong. We require an explicit rollback (or ROLLBACK TO SAVEPOINT) to restore the session to a usable state. This is good. But we're the odd one out there. Almost everyone else does much like what fsync() does on Linux, report the error and forget it. In any case, we're not going to get anyone to backpatch a fix for this into all kernels, so we're stuck working around it. I'll do some testing with ENOSPC tomorrow, propose a patch, report back. <hr /> <pre><code>From:Greg Stark <stark(at)mit(dot)edu> Date:2018-04-03 14:37:30 </code></pre> On 3 April 2018 at 14:36, Anthony Iliopoulos wrote: <blockquote> If EIO persists between invocations until explicitly cleared, a process cannot possibly make any decision as to if it should clear the error </blockquote> I still don't understand what "clear the error" means here. The writes still haven't been written out. We don't care about tracking errors, we just care whether all the writes to the file have been flushed to disk. By "clear the error" you mean throw away the dirty pages and revert part of the file to some old data? Why would anyone ever want that? <blockquote> But instead of deconstructing and debating the semantics of the current mechanism, why not come up with the ideal desired form of error reporting/tracking granularity etc., and see how this may be fitted into kernels as a new interface. </blockquote> Because Postgres is portable software that won't be able to use some Linux-specific interface. And doesn't really need any granular error reporting system anyways. It just needs to know when all writes have been synced to disk. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-03 16:52:07 </code></pre> On Tue, Apr 03, 2018 at 03:37:30PM +0100, Greg Stark wrote: <blockquote> On 3 April 2018 at 14:36, Anthony Iliopoulos wrote: <blockquote> If EIO persists between invocations until explicitly cleared, a process cannot possibly make any decision as to if it should clear the error </blockquote> I still don't understand what "clear the error" means here. The writes still haven't been written out. We don't care about tracking errors, we just care whether all the writes to the file have been flushed to disk. By "clear the error" you mean throw away the dirty pages and revert part of the file to some old data? Why would anyone ever want that? </blockquote> It means that the responsibility of recovering the data is passed back to the application. The writes may never be able to be written out. How would a kernel deal with that? Either discard the data (and have the writer acknowledge) or buffer the data until reboot and simply risk going OOM. It's not what someone would want, but rather need to deal with, one way or the other. At least on the application-level there's a fighting chance for restoring to a consistent state. The kernel does not have that opportunity. <blockquote> <blockquote> But instead of deconstructing and debating the semantics of the current mechanism, why not come up with the ideal desired form of error reporting/tracking granularity etc., and see how this may be fitted into kernels as a new interface. </blockquote> Because Postgres is portable software that won't be able to use some Linux-specific interface. And doesn't really need any granular error </blockquote> I don't really follow this argument, Pg is admittedly using non-portable interfaces (e.g the sync_file_range()). While it's nice to avoid platform specific hacks, expecting that the POSIX semantics will be consistent across systems is simply a 90's pipe dream. While it would be lovely to have really consistent interfaces for application writers, this is simply not going to happen any time soon. And since those problematic semantics of fsync() appear to be prevalent in other systems as well that are not likely to be changed, you cannot rely on preconception that once buffers are handed over to kernel you have a guarantee that they will be eventually persisted no matter what. (Why even bother having fsync() in that case? The kernel would eventually evict and writeback dirty pages anyway. The point of reporting the error back to the application is to give it a chance to recover - the kernel could repeat "fsync()" itself internally if this would solve anything). <blockquote> reporting system anyways. It just needs to know when all writes have been synced to disk. </blockquote> Well, it does know when some writes have not been synced to disk, exactly because the responsibility is passed back to the application. I do realize this puts more burden back to the application, but what would a viable alternative be? Would you rather have a kernel that risks periodically going OOM due to this design decision? <hr /> <pre><code>From:Robert Haas <robertmhaas(at)gmail(dot)com> Date:2018-04-03 21:47:01 </code></pre> On Tue, Apr 3, 2018 at 6:35 AM, Anthony Iliopoulos wrote: <blockquote> <blockquote> Like other people here, I think this is 100% unreasonable, starting with "the dirty pages which cannot been written out are practically thrown away". Who decided that was OK, and on the basis of what wording in what specification? I think it's always unreasonable to </blockquote> If you insist on strict conformance to POSIX, indeed the linux glibc configuration and associated manpage are probably wrong in stating that _POSIX_SYNCHRONIZED_IO is supported. The implementation matches that of the flexibility allowed by not supporting SIO. There's a long history of brokenness between linux and posix, and I think there was never an intention of conforming to the standard. </blockquote> Well, then the man page probably shouldn't say CONFORMING TO 4.3BSD, POSIX.1-2001, which on the first system I tested, it did. Also, the summary should be changed from the current "fsync, fdatasync - synchronize a file's in-core state with storage device" by adding ", possibly by randomly undoing some of the changes you think you made to the file". <blockquote> I believe (as tried to explain earlier) there is a certain assumption being made that the writer and original owner of data is responsible for dealing with potential errors in order to avoid data loss (which should be only of interest to the original writer anyway). It would be very questionable for the interface to persist the error while subsequent writes and fsyncs to different offsets may as well go through. </blockquote> No, that's not questionable at all. fsync() doesn't take any argument saying which part of the file you care about, so the kernel is entirely not entitled to assume it knows to which writes a given fsync() call was intended to apply. <blockquote> Another process may need to write into the file and fsync, while being unaware of those newly introduced semantics is now faced with EIO because some unrelated previous process failed some earlier writes and did not bother to clear the error for those writes. In a similar scenario where the second process is aware of the new semantics, it would naturally go ahead and clear the global error in order to proceed with its own write()+fsync(), which would essentially amount to the same problematic semantics you have now. </blockquote> I don't deny that it's possible that somebody could have an application which is utterly indifferent to the fact that earlier modifications to a file failed due to I/O errors, but is A-OK with that as long as later modifications can be flushed to disk, but I don't think that's a normal thing to want. <blockquote> <blockquote> Also, this really does make it impossible to write reliable programs. Imagine that, while the server is running, somebody runs a program which opens a file in the data directory, calls fsync() on it, and closes it. If the fsync() fails, postgres is now borked and has no way of being aware of the problem. If we knew, we could PANIC, but we'll never find out, because the unrelated process ate the error. This is exactly the sort of ill-considered behavior that makes fcntl() locking nearly useless. </blockquote> Fully agree, and the errseq_t fixes have dealt exactly with the issue of making sure that the error is reported to all file descriptors that happen to be open at the time of error. </blockquote> Well, in PostgreSQL, we have a background process called the checkpointer which is the process that normally does all of the fsync() calls but only a subset of the write() calls. The checkpointer does not, however, necessarily have every file open all the time, so these fixes aren't sufficient to make sure that the checkpointer ever sees an fsync() failure. What you have (or someone has) basically done here is made an undocumented assumption about which file descriptors might care about a particular error, but it just so happens that PostgreSQL has never conformed to that assumption. You can keep on saying the problem is with our assumptions, but it doesn't seem like a very good guess to me to suppose that we're the only program that has ever made them. The documentation for fsync() gives zero indication that it's edge-triggered, and so complaining that people wouldn't like it if it became level-triggered seems like an ex post facto justification for a poorly-chosen behavior: they probably think (as we did prior to a week ago) that it already is. <blockquote> Not sure I understand this case. The application may indeed re-write a bunch of pages that have failed and proceed with fsync(). The kernel will deal with combining the writeback of all the re-written pages. But further the necessity of combining for performance really depends on the exact storage medium. At the point you start caring about write-combining, the kernel community will naturally redirect you to use DIRECT_IO. </blockquote> Well, the way PostgreSQL works today, we typically run with say 8GB of shared_buffers even if the system memory is, say, 200GB. As pages are evicted from our relatively small cache to the operating system, we track which files need to be fsync()'d at checkpoint time, but we don't hold onto the blocks. Until checkpoint time, the operating system is left to decide whether it's better to keep caching the dirty blocks (thus leaving less memory for other things, but possibly allowing write-combining if the blocks are written again) or whether it should clean them to make room for other things. This means that only a small portion of the operating system memory is directly managed by PostgreSQL, while allowing the effective size of our cache to balloon to some very large number if the system isn't under heavy memory pressure. Now, I hear the DIRECT_IO thing and I assume we're eventually going to have to go that way: Linux kernel developers seem to think that "real men use O_DIRECT" and so if other forms of I/O don't provide useful guarantees, well that's our fault for not using O_DIRECT. That's a political reason, not a technical reason, but it's a reason all the same. Unfortunately, that is going to add a huge amount of complexity, because if we ran with shared_buffers set to a large percentage of system memory, we couldn't allocate large chunks of memory for sorts and hash tables from the operating system any more. We'd have to allocate it from our own shared_buffers because that's basically all the memory there is and using substantially more might run the system out entirely. So it's a huge, huge architectural change. And even once it's done it is in some ways inferior to what we are doing today -- true, it gives us superior control over writeback timing, but it also makes PostgreSQL play less nicely with other things running on the same machine, because now PostgreSQL has a dedicated chunk of whatever size it has, rather than using some portion of the OS buffer cache that can grow and shrink according to memory needs both of other parts of PostgreSQL and other applications on the system. <blockquote> I suppose that if the check-and-clear semantics are problematic for Pg, one could suggest a kernel patch that opts-in to a level-triggered reporting of fsync() on a per-descriptor basis, which seems to be non-intrusive and probably sufficient to cover your expected use-case. </blockquote> That would certainly be better than nothing. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-03 23:59:27 </code></pre> On Tue, Apr 3, 2018 at 1:29 PM, Thomas Munro wrote: <blockquote> Interestingly, there don't seem to be many operating systems that can report ENOSPC from fsync(), based on a quick scan through some documentation: POSIX, AIX, HP-UX, FreeBSD, OpenBSD, NetBSD: no Illumos/Solaris, Linux, macOS: yes </blockquote> Oops, reading comprehension fail. POSIX yes (since issue 5), via the note that read() and write()'s error conditions can also be returned. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-04 00:56:37 </code></pre> On Tue, Apr 3, 2018 at 05:47:01PM -0400, Robert Haas wrote: <blockquote> Well, in PostgreSQL, we have a background process called the checkpointer which is the process that normally does all of the fsync() calls but only a subset of the write() calls. The checkpointer does not, however, necessarily have every file open all the time, so these fixes aren't sufficient to make sure that the checkpointer ever sees an fsync() failure. </blockquote> There has been a lot of focus in this thread on the workflow: <pre><code>write() -> blocks remain in kernel memory -> fsync() -> panic? </code></pre> But what happens in this workflow: <pre><code>write() -> kernel syncs blocks to storage -> fsync() </code></pre> Is fsync() going to see a "kernel syncs blocks to storage" failure? There was already discussion that if the fsync() causes the "syncs blocks to storage", fsync() will only report the failure once, but will it see any failure in the second workflow? There is indication that a failed write to storage reports back an error once and clears the dirty flag, but do we know it keeps things around long enough to report an error to a future fsync()? You would think it does, but I have to ask since our fsync() assumptions have been wrong for so long. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-04 01:54:50 </code></pre> On Wed, Apr 4, 2018 at 12:56 PM, Bruce Momjian wrote: <blockquote> There has been a lot of focus in this thread on the workflow: <pre><code> write() -> blocks remain in kernel memory -> fsync() -> panic? </code></pre> But what happens in this workflow: <pre><code> write() -> kernel syncs blocks to storage -> fsync() </code></pre> Is fsync() going to see a "kernel syncs blocks to storage" failure? There was already discussion that if the fsync() causes the "syncs blocks to storage", fsync() will only report the failure once, but will it see any failure in the second workflow? There is indication that a failed write to storage reports back an error once and clears the dirty flag, but do we know it keeps things around long enough to report an error to a future fsync()? You would think it does, but I have to ask since our fsync() assumptions have been wrong for so long. </blockquote> I believe there were some problems of that nature (with various twists, based on other concurrent activity and possibly different fds), and those problems were fixed by the errseq_t system developed by Jeff Layton in Linux 4.13. Call that "bug #1". The second issues is that the pages are marked clean after the error is reported, so further attempts to fsync() the data (in our case for a new attempt to checkpoint) will be futile but appear successful. Call that "bug #2", with the proviso that some people apparently think it's reasonable behaviour and not a bug. At least there is a plausible workaround for that: namely the nuclear option proposed by Craig. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-04 02:05:19 </code></pre> On Wed, Apr 4, 2018 at 01:54:50PM +1200, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 12:56 PM, Bruce Momjian wrote: <blockquote> There has been a lot of focus in this thread on the workflow: <pre><code> write() -> blocks remain in kernel memory -> fsync() -> panic? </code></pre> But what happens in this workflow: <pre><code> write() -> kernel syncs blocks to storage -> fsync() </code></pre> Is fsync() going to see a "kernel syncs blocks to storage" failure? There was already discussion that if the fsync() causes the "syncs blocks to storage", fsync() will only report the failure once, but will it see any failure in the second workflow? There is indication that a failed write to storage reports back an error once and clears the dirty flag, but do we know it keeps things around long enough to report an error to a future fsync()? You would think it does, but I have to ask since our fsync() assumptions have been wrong for so long. </blockquote> I believe there were some problems of that nature (with various twists, based on other concurrent activity and possibly different fds), and those problems were fixed by the errseq_t system developed by Jeff Layton in Linux 4.13. Call that "bug #1". </blockquote> So all our non-cutting-edge Linux systems are vulnerable and there is no workaround Postgres can implement? Wow. <blockquote> The second issues is that the pages are marked clean after the error is reported, so further attempts to fsync() the data (in our case for a new attempt to checkpoint) will be futile but appear successful. Call that "bug #2", with the proviso that some people apparently think it's reasonable behaviour and not a bug. At least there is a plausible workaround for that: namely the nuclear option proposed by Craig. </blockquote> Yes, that one I understood. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-04 02:14:28 </code></pre> On Tue, Apr 3, 2018 at 10:05:19PM -0400, Bruce Momjian wrote: <blockquote> On Wed, Apr 4, 2018 at 01:54:50PM +1200, Thomas Munro wrote: <blockquote> I believe there were some problems of that nature (with various twists, based on other concurrent activity and possibly different fds), and those problems were fixed by the errseq_t system developed by Jeff Layton in Linux 4.13. Call that "bug #1". </blockquote> So all our non-cutting-edge Linux systems are vulnerable and there is no workaround Postgres can implement? Wow. </blockquote> Uh, are you sure it fixes our use-case? From the email description it sounded like it only reported fsync errors for every open file descriptor at the time of the failure, but the checkpoint process might open the file after the failure and try to fsync a write that happened before the failure. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-04 02:40:16 </code></pre> On 4 April 2018 at 05:47, Robert Haas wrote: <blockquote> Now, I hear the DIRECT_IO thing and I assume we're eventually going to have to go that way: Linux kernel developers seem to think that "real men use O_DIRECT" and so if other forms of I/O don't provide useful guarantees, well that's our fault for not using O_DIRECT. That's a political reason, not a technical reason, but it's a reason all the same. </blockquote> I looked into buffered AIO a while ago, by the way, and just ... hell no. Run, run as fast as you can. The trouble with direct I/O is that it pushes a lot of work back on PostgreSQL regarding knowledge of the storage subsystem, I/O scheduling, etc. It's absurd to have the kernel do this, unless you want it reliable, in which case you bypass it and drive the hardware directly. We'd need pools of writer threads to deal with all the blocking I/O. It'd be such a nightmare. Hey, why bother having a kernel at all, except for drivers? <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-04 02:44:22 </code></pre> On Wed, Apr 4, 2018 at 2:14 PM, Bruce Momjian wrote: <blockquote> On Tue, Apr 3, 2018 at 10:05:19PM -0400, Bruce Momjian wrote: <blockquote> On Wed, Apr 4, 2018 at 01:54:50PM +1200, Thomas Munro wrote: <blockquote> I believe there were some problems of that nature (with various twists, based on other concurrent activity and possibly different fds), and those problems were fixed by the errseq_t system developed by Jeff Layton in Linux 4.13. Call that "bug #1". </blockquote> So all our non-cutting-edge Linux systems are vulnerable and there is no workaround Postgres can implement? Wow. </blockquote> Uh, are you sure it fixes our use-case? From the email description it sounded like it only reported fsync errors for every open file descriptor at the time of the failure, but the checkpoint process might open the file after the failure and try to fsync a write that happened before the failure. </blockquote> I'm not sure of anything. I can see that it's designed to report errors since the last fsync() of the file (presumably via any fd), which sounds like the desired behaviour: <a href="https://github.com/torvalds/linux/blob/master/mm/filemap.c#L682">https://github.com/torvalds/linux/blob/master/mm/filemap.c#L682</a> <blockquote> When userland calls fsync (or something like nfsd does the equivalent), we want to report any writeback errors that occurred since the last fsync (or since the file was opened if there haven't been any). </blockquote> But I'm not sure what the lifetime of the passed-in "file" and more importantly "file->f_wb_err" is. Specifically, what happens to it if no one has the file open at all, between operations? It is reference counted, see fs/file_table.c. I don't know enough about it to comment. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-04 05:29:28 </code></pre> On Wed, Apr 4, 2018 at 2:44 PM, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 2:14 PM, Bruce Momjian wrote: <blockquote> Uh, are you sure it fixes our use-case? From the email description it sounded like it only reported fsync errors for every open file descriptor at the time of the failure, but the checkpoint process might open the file after the failure and try to fsync a write that happened before the failure. </blockquote> I'm not sure of anything. I can see that it's designed to report errors since the last fsync() of the file (presumably via any fd), which sounds like the desired behaviour: [..] </blockquote> Scratch that. Whenever you open a file descriptor you can't see any preceding errors at all, because: <pre><code>/* Ensure that we skip any errors that predate opening of the file */ f->f_wb_err = filemap_sample_wb_err(f->f_mapping); </code></pre> <a href="https://github.com/torvalds/linux/blob/master/fs/open.c#L752">https://github.com/torvalds/linux/blob/master/fs/open.c#L752</a> Our whole design is based on being able to open, close and reopen files at will from any process, and in particular to fsync() from a different process that didn't inherit the fd but instead opened it later. But it looks like that might be able to eat errors that occurred during asynchronous writeback (when there was nobody to report them to), before you opened the file? If so I'm not sure how that can possibly be considered to be an implementation of _POSIX_SYNCHRONIZED_IO: "the fsync() function shall force all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronized I/O completion state." Note "the file", not "this file descriptor + copies", and without reference to when you opened it. <blockquote> But I'm not sure what the lifetime of the passed-in "file" and more importantly "file->f_wb_err" is. </blockquote> It's really inode->i_mapping->wb_err's lifetime that I should have been asking about there, not file->f_wb_err, but I see now that that question is irrelevant due to the above. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-04 06:00:21 </code></pre> On 4 April 2018 at 13:29, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 2:44 PM, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 2:14 PM, Bruce Momjian wrote: <blockquote> Uh, are you sure it fixes our use-case? From the email description it sounded like it only reported fsync errors for every open file descriptor at the time of the failure, but the checkpoint process might open the file after the failure and try to fsync a write that happened before the failure. </blockquote> I'm not sure of anything. I can see that it's designed to report errors since the last fsync() of the file (presumably via any fd), which sounds like the desired behaviour: [..] </blockquote> Scratch that. Whenever you open a file descriptor you can't see any preceding errors at all, because: /* Ensure that we skip any errors that predate opening of the file */ f->f_wb_err = filemap_sample_wb_err(f->f_mapping); <a href="https://github.com/torvalds/linux/blob/master/fs/open.c#L752">https://github.com/torvalds/linux/blob/master/fs/open.c#L752</a> Our whole design is based on being able to open, close and reopen files at will from any process, and in particular to fsync() from a different process that didn't inherit the fd but instead opened it later. But it looks like that might be able to eat errors that occurred during asynchronous writeback (when there was nobody to report them to), before you opened the file? </blockquote> Holy hell. So even PANICing on fsync() isn't sufficient, because the kernel will deliberately hide writeback errors that predate our fsync() call from us? I'll see if I can expand my testcase for that. I'm presently dockerizing it to make it easier for others to use, but that turns out to be a major pain when using devmapper etc. Docker in privileged mode doesn't seem to play nice with device-mapper. Does that mean that the ONLY ways to do reliable I/O are: <ul> <li>single-process, single-file-descriptor write() then fsync(); on failure, retry all work since last successful fsync()</li> <li>direct I/O</li> </ul> ? <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-04 07:32:04 </code></pre> On Wed, Apr 4, 2018 at 6:00 PM, Craig Ringer wrote: <blockquote> On 4 April 2018 at 13:29, Thomas Munro wrote: <blockquote> /* Ensure that we skip any errors that predate opening of the file */ f->f_wb_err = filemap_sample_wb_err(f->f_mapping); [...] </blockquote> Holy hell. So even PANICing on fsync() isn't sufficient, because the kernel will deliberately hide writeback errors that predate our fsync() call from us? </blockquote> Predates the opening of the file by the process that calls fsync(). Yeah, it sure looks that way based on the above code fragment. Does anyone know better? <blockquote> Does that mean that the ONLY ways to do reliable I/O are: <ul> <li>single-process, single-file-descriptor write() then fsync(); on failure, retry all work since last successful fsync()</li> </ul> </blockquote> I suppose you could some up with some crazy complicated IPC scheme to make sure that the checkpointer always has an fd older than any writes to be flushed, with some fallback strategy for when it can't take any more fds. I haven't got any good ideas right now. <blockquote> <ul> <li>direct I/O</li> </ul> </blockquote> As a bit of an aside, I gather that when you resize files (think truncating/extending relation files) you still need to call fsync() even if you read/write all data with O_DIRECT, to make it flush the filesystem meta-data. I have no idea if that could also be affected by eaten writeback errors. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-04 07:51:53 </code></pre> On 4 April 2018 at 14:00, Craig Ringer wrote: <blockquote> On 4 April 2018 at 13:29, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 2:44 PM, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 2:14 PM, Bruce Momjian wrote: <blockquote> Uh, are you sure it fixes our use-case? From the email description it sounded like it only reported fsync errors for every open file descriptor at the time of the failure, but the checkpoint process might open the file after the failure and try to fsync a write that happened before the failure. </blockquote> I'm not sure of anything. I can see that it's designed to report errors since the last fsync() of the file (presumably via any fd), which sounds like the desired behaviour: [..] </blockquote> Scratch that. Whenever you open a file descriptor you can't see any preceding errors at all, because: /* Ensure that we skip any errors that predate opening of the file */ f->f_wb_err = filemap_sample_wb_err(f->f_mapping); <a href="https://github.com/torvalds/linux/blob/master/fs/open.c#L752">https://github.com/torvalds/linux/blob/master/fs/open.c#L752</a> Our whole design is based on being able to open, close and reopen files at will from any process, and in particular to fsync() from a different process that didn't inherit the fd but instead opened it later. But it looks like that might be able to eat errors that occurred during asynchronous writeback (when there was nobody to report them to), before you opened the file? </blockquote> Holy hell. So even PANICing on fsync() isn't sufficient, because the kernel will deliberately hide writeback errors that predate our fsync() call from us? I'll see if I can expand my testcase for that. I'm presently dockerizing it to make it easier for others to use, but that turns out to be a major pain when using devmapper etc. Docker in privileged mode doesn't seem to play nice with device-mapper. </blockquote> Done, you can find it in <a href="https://github.com/ringerc/scrapcode/tree/master/testcases/fsync-error-clear">https://github.com/ringerc/scrapcode/tree/master/testcases/fsync-error-clear</a> now. Warning, this runs a Docker container in privileged mode on your system, and it uses devicemapper. Read it before you run it, and while I've tried to keep it safe, beware that it might eat your system. For now it tests only xfs and EIO. Other FSs should be easy enough. I haven't added coverage for multi-processing yet, but given what you found above, I should. I'll probably just system() a copy of the same proc with instructions to only fsync(). I'll do that next. I haven't worked out a reliable way to trigger ENOSPC on fsync() yet, when mapping without the error hole. It happens sometimes but I don't know why, it almost always happens on write() instead. I know it can happen on nfs, but I'm hoping for a saner example than that to test with. ext4 and xfs do delayed allocation but eager reservation so it shouldn't happen to them. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-04 13:49:38 </code></pre> On Wed, Apr 4, 2018 at 07:32:04PM +1200, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 6:00 PM, Craig Ringer wrote: <blockquote> On 4 April 2018 at 13:29, Thomas Munro wrote: <blockquote> /* Ensure that we skip any errors that predate opening of the file */ f->f_wb_err = filemap_sample_wb_err(f->f_mapping); [...] </blockquote> Holy hell. So even PANICing on fsync() isn't sufficient, because the kernel will deliberately hide writeback errors that predate our fsync() call from us? </blockquote> Predates the opening of the file by the process that calls fsync(). Yeah, it sure looks that way based on the above code fragment. Does anyone know better? </blockquote> Uh, just to clarify, what is new here is that it is ignoring any errors that happened before the open(). It is not ignoring write()'s that happened but have not been written to storage before the open(). FYI, pg_test_fsync has always tested the ability to fsync() writes() from from other processes: <pre><code>Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) write, fsync, close 5360.341 ops/sec 187 usecs/op write, close, fsync 4785.240 ops/sec 209 usecs/op </code></pre> Those two numbers should be similar. I added this as a check to make sure the behavior we were relying on was working. I never tested sync errors though. I think the fundamental issue is that we always assumed that writes to the kernel that could not be written to storage would remain in the kernel until they succeeded, and that fsync() would report their existence. I can understand why kernel developers don't want to keep failed sync buffers in memory, and once they are gone we lose reporting of their failure. Also, if the kernel is going to not retry the syncs, how long should it keep reporting the sync failure? To the first fsync that happens after the failure? How long should it continue to record the failure? What if no fsync() every happens, which is likely for non-Postgres workloads? I think once they decided to discard failed syncs and not retry them, the fsync behavior we are complaining about was almost required. Our only option might be to tell administrators to closely watch for kernel write failure messages, and then restore or failover. :-( The last time I remember being this surprised about storage was in the early Postgres years when we learned that just because the BSD file system uses 8k pages doesn't mean those are atomically written to storage. We knew the operating system wrote the data in 8k chunks to storage but: <ul> <li>the 8k pages are written as separate 512-byte sectors</li> <li>the 8k might be contiguous logically on the drive but not physically</li> <li>even 512-byte sectors are not written atomically</li> </ul> This is why we added pre-page images are written to WAL, which is what full_page_writes controls. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-04 13:53:01 </code></pre> On Wed, Apr 4, 2018 at 10:40:16AM +0800, Craig Ringer wrote: <blockquote> The trouble with direct I/O is that it pushes a lot of work back on PostgreSQL regarding knowledge of the storage subsystem, I/O scheduling, etc. It's absurd to have the kernel do this, unless you want it reliable, in which case you bypass it and drive the hardware directly. We'd need pools of writer threads to deal with all the blocking I/O. It'd be such a nightmare. Hey, why bother having a kernel at all, except for drivers? </blockquote> I believe this is how Oracle views the kernel, so there is precedent for this approach, though I am not advocating it. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-04 14:00:15 </code></pre> On 4 April 2018 at 15:51, Craig Ringer wrote: <blockquote> On 4 April 2018 at 14:00, Craig Ringer wrote: <blockquote> On 4 April 2018 at 13:29, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 2:44 PM, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 2:14 PM, Bruce Momjian wrote: <blockquote> Uh, are you sure it fixes our use-case? From the email description it sounded like it only reported fsync errors for every open file descriptor at the time of the failure, but the checkpoint process might open the file after the failure and try to fsync a write that happened before the failure. </blockquote> I'm not sure of anything. I can see that it's designed to report errors since the last fsync() of the file (presumably via any fd), which sounds like the desired behaviour: [..] </blockquote> Scratch that. Whenever you open a file descriptor you can't see any preceding errors at all, because: /* Ensure that we skip any errors that predate opening of the file */ f->f_wb_err = filemap_sample_wb_err(f->f_mapping); <a href="https://github.com/torvalds/linux/blob/master/fs/open.c#L752">https://github.com/torvalds/linux/blob/master/fs/open.c#L752</a> Our whole design is based on being able to open, close and reopen files at will from any process, and in particular to fsync() from a different process that didn't inherit the fd but instead opened it later. But it looks like that might be able to eat errors that occurred during asynchronous writeback (when there was nobody to report them to), before you opened the file? </blockquote> Holy hell. So even PANICing on fsync() isn't sufficient, because the kernel will deliberately hide writeback errors that predate our fsync() call from us? I'll see if I can expand my testcase for that. I'm presently dockerizing it to make it easier for others to use, but that turns out to be a major pain when using devmapper etc. Docker in privileged mode doesn't seem to play nice with device-mapper. </blockquote> Done, you can find it in <a href="https://github.com/ringerc/scrapcode/tree/master/">https://github.com/ringerc/scrapcode/tree/master/</a> testcases/fsync-error-clear now. </blockquote> Update. Now supports multiple FSes. I've tried xfs, jfs, ext3, ext4, even vfat. All behave the same on EIO. Didn't try zfs-on-linux or other platforms yet. Still working on getting ENOSPC on fsync() rather than write(). Kernel code reading suggests this is possible, but all the above FSes reserve space eagerly on write( ) even if they do delayed allocation of the actual storage, so it doesn't seem to happen at least in my simple single-process test. I'm not overly inclined to complain about a fsync() succeeding after a write() error. That seems reasonable enough, the kernel told the app at the time of the failure. What else is it going to do? I don't personally even object hugely to the current fsync() behaviour if it were, say, DOCUMENTED and conformant to the relevant standards, though not giving us any sane way to find out the affected file ranges makes it drastically harder to recover sensibly. But what's come out since on this thread, that we cannot even rely on fsync() giving us an EIO once when it loses our data, because: <ul> <li>all currently widely deployed kernels can fail to deliver info due to recently fixed limitation; and</li> <li>the kernel deliberately hides errors from us if they relate to writes that occurred before we opened the FD (?)</li> </ul> ... that's really troubling. I thought we could at least fix this by PANICing on EIO, and was mostly worried about ENOSPC. But now it seems we can't even do that and expect reliability. So how the @#$ are we meant to do? It's the error reporting issues around closing and reopening files with outstanding buffered I/O that's really going to hurt us here. I'll be expanding my test case to cover that shortly. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-04 14:09:09 </code></pre> On 4 April 2018 at 22:00, Craig Ringer wrote: <blockquote> It's the error reporting issues around closing and reopening files with outstanding buffered I/O that's really going to hurt us here. I'll be expanding my test case to cover that shortly. </blockquote> Also, just to be clear, this is not in any way confined to xfs and/or lvm as I originally thought it might be. Nor is ext3/ext4's errors=remount-ro protective. data_err=abort doesn't help either (so what does it do?). What bewilders me is that running with data=journal doesn't seem to be safe either. WTF? <pre><code>[26438.846111] EXT4-fs (dm-0): mounted filesystem with journalled data mode. Opts: errors=remount-ro,data_err=abort,data=journal [26454.125319] EXT4-fs warning (device dm-0): ext4_end_bio:323: I/O error 10 writing to inode 12 (offset 0 size 0 starting block 59393) [26454.125326] Buffer I/O error on device dm-0, logical block 59393 [26454.125337] Buffer I/O error on device dm-0, logical block 59394 [26454.125343] Buffer I/O error on device dm-0, logical block 59395 [26454.125350] Buffer I/O error on device dm-0, logical block 59396 </code></pre> and splat, there goes your data anyway. It's possible that this is in some way related to using the device-mapper "error" target and a loopback device in testing. But I don't really see how. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-04 14:25:47 </code></pre> On Wed, Apr 4, 2018 at 10:09:09PM +0800, Craig Ringer wrote: <blockquote> On 4 April 2018 at 22:00, Craig Ringer wrote: <pre><code>It's the error reporting issues around closing and reopening files with outstanding buffered I/O that's really going to hurt us here. I'll be expanding my test case to cover that shortly. </code></pre> Also, just to be clear, this is not in any way confined to xfs and/or lvm as I originally thought it might be. Nor is ext3/ext4's errors=remount-ro protective. data_err=abort doesn't help either (so what does it do?). </blockquote> Anthony Iliopoulos reported in this thread that errors=remount-ro is only affected by metadata writes. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-04 14:42:18 </code></pre> On 4 April 2018 at 22:25, Bruce Momjian wrote: <blockquote> On Wed, Apr 4, 2018 at 10:09:09PM +0800, Craig Ringer wrote: <blockquote> On 4 April 2018 at 22:00, Craig Ringer wrote: <pre><code>It's the error reporting issues around closing and reopening files with outstanding buffered I/O that's really going to hurt us here. I'll be expanding my test case to cover that shortly. </code></pre> Also, just to be clear, this is not in any way confined to xfs and/or lvm as I originally thought it might be. Nor is ext3/ext4's errors=remount-ro protective. data_err=abort doesn't help either (so what does it do?). </blockquote> Anthony Iliopoulos reported in this thread that errors=remount-ro is only affected by metadata writes. </blockquote> Yep, I gathered. I was referring to data_err. <hr /> <pre><code>From:Antonis Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-04 15:23:31 </code></pre> On Wed, Apr 4, 2018 at 4:42 PM, Craig Ringer wrote: <blockquote> On 4 April 2018 at 22:25, Bruce Momjian wrote: <blockquote> On Wed, Apr 4, 2018 at 10:09:09PM +0800, Craig Ringer wrote: <blockquote> On 4 April 2018 at 22:00, Craig Ringer wrote: It's the error reporting issues around closing and reopening files with outstanding buffered I/O that's really going to hurt us here. I'll be expanding my test case to cover that shortly. Also, just to be clear, this is not in any way confined to xfs and/or lvm as I originally thought it might be. Nor is ext3/ext4's errors=remount-ro protective. data_err=abort doesn't help either (so what does it do?). </blockquote> Anthony Iliopoulos reported in this thread that errors=remount-ro is only affected by metadata writes. </blockquote> Yep, I gathered. I was referring to data_err. </blockquote> As far as I recall data_err=abort pertains to the jbd2 handling of potential writeback errors. Jbd2 will inetrnally attempt to drain the data upon txn commit (and it's even kind enough to restore the EIO at the address space level, that otherwise would get eaten). When data_err=abort is set, then jbd2 forcibly shuts down the entire journal, with the error being propagated upwards to ext4. I am not sure at which point this would be manifested to userspace and how, but in principle any subsequent fs operations would get some filesystem error due to the journal being down (I would assume similar to remounting the fs read-only). Since you are using data=journal, I would indeed expect to see something more than what you saw in dmesg. I can have a look later, I plan to also respond to some of the other interesting issues that you guys raised in the thread. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-04 15:23:51 </code></pre> On 4 April 2018 at 21:49, Bruce Momjian wrote: <blockquote> On Wed, Apr 4, 2018 at 07:32:04PM +1200, Thomas Munro wrote: <blockquote> On Wed, Apr 4, 2018 at 6:00 PM, Craig Ringer wrote: <blockquote> On 4 April 2018 at 13:29, Thomas Munro wrote: <blockquote> /* Ensure that we skip any errors that predate opening of the file */ f->f_wb_err = filemap_sample_wb_err(f->f_mapping); [...] </blockquote> Holy hell. So even PANICing on fsync() isn't sufficient, because the kernel will deliberately hide writeback errors that predate our fsync() call from us? </blockquote> Predates the opening of the file by the process that calls fsync(). Yeah, it sure looks that way based on the above code fragment. Does anyone know better? </blockquote> Uh, just to clarify, what is new here is that it is ignoring any errors that happened before the open(). It is not ignoring write()'s that happened but have not been written to storage before the open(). FYI, pg_test_fsync has always tested the ability to fsync() writes() from from other processes: <pre><code> Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a </code></pre> different descriptor.) write, fsync, close 5360.341 ops/sec 187 usecs/op write, close, fsync 4785.240 ops/sec 209 usecs/op Those two numbers should be similar. I added this as a check to make sure the behavior we were relying on was working. I never tested sync errors though. I think the fundamental issue is that we always assumed that writes to the kernel that could not be written to storage would remain in the kernel until they succeeded, and that fsync() would report their existence. I can understand why kernel developers don't want to keep failed sync buffers in memory, and once they are gone we lose reporting of their failure. Also, if the kernel is going to not retry the syncs, how long should it keep reporting the sync failure? </blockquote> Ideally until the app tells it not to. But there's no standard API for that. The obvious answer seems to be "until the FD is closed". But we just discussed how Pg relies on being able to open and close files freely. That may not be as reasonable a thing to do as we thought it was when you consider error reporting. What's the kernel meant to do? How long should it remember "I had an error while doing writeback on this file"? Should it flag the file metadata and remember across reboots? Obviously not, but where does it stop? Tell the next program that does an fsync() and forget? How could it associate a dirty buffer on a file with no open FDs with any particular program at all? And what if the app did a write then closed the file and went away, never to bother to check the file again, like most apps do? Some I/O errors are transient (network issue, etc). Some are recoverable with some sort of action, like disk space issues, but may take a long time before an admin steps in. Some are entirely unrecoverable (disk 1 in striped array is on fire) and there's no possible recovery. Currently we kind of hope the kernel will deal with figuring out which is which and retrying. Turns out it doesn't do that so much, and I don't think the reasons for that are wholly unreasonable. We may have been asking too much. That does leave us in a pickle when it comes to the checkpointer and opening/closing FDs. I don't know what the "right" thing for the kernel to do from our perspective even is here, but the best I can come up with is actually pretty close to what it does now. Report the fsync() error to the first process that does an fsync() since the writeback error if one has occurred, then forget about it. Ideally I'd have liked it to mark all FDs pointing to the file with a flag to report EIO on next fsync too, but it turns out that won't even help us due to our opening and closing behaviour, so we're going to have to take responsibility for handling and communicating that ourselves, preventing checkpoint completion if any backend gets an fsync error. Probably by PANICing. Some extra work may be needed to ensure reliable ordering and stop checkpoints completing if their fsync() succeeds due to a recent failed fsync() on a normal backend that hasn't PANICed or where the postmaster hasn't noticed yet. Our only option might be to tell administrators to closely watch for > kernel write failure messages, and then restore or failover. :-( > Speaking of, there's not necessarily any lost page write error in the logs AFAICS. My tests often just show "Buffer I/O error on device dm-0, logical block 59393" or the like. <hr /> <pre><code>From:Gasper Zejn <zejn(at)owca(dot)info> Date:2018-04-04 17:23:58 </code></pre> On 04. 04. 2018 15:49, Bruce Momjian wrote: <blockquote> I can understand why kernel developers don't want to keep failed sync buffers in memory, and once they are gone we lose reporting of their failure. Also, if the kernel is going to not retry the syncs, how long should it keep reporting the sync failure? To the first fsync that happens after the failure? How long should it continue to record the failure? What if no fsync() every happens, which is likely for non-Postgres workloads? I think once they decided to discard failed syncs and not retry them, the fsync behavior we are complaining about was almost required. </blockquote> Ideally the kernel would keep its data for as little time as possible. With fsync, it doesn't really know which process is interested in knowing about a write error, it just assumes the caller will know how to deal with it. Most unfortunate issue is there's no way to get information about a write error. Thinking aloud - couldn't/shouldn't a write error also be a file system event reported by inotify? Admittedly that's only a thing on Linux, but still. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-04 17:51:03 </code></pre> On Wed, Apr 4, 2018 at 11:23:51PM +0800, Craig Ringer wrote: <blockquote> On 4 April 2018 at 21:49, Bruce Momjian wrote: <blockquote> I can understand why kernel developers don't want to keep failed sync buffers in memory, and once they are gone we lose reporting of their failure. Also, if the kernel is going to not retry the syncs, how long should it keep reporting the sync failure? </blockquote> Ideally until the app tells it not to. But there's no standard API for that. </blockquote> You would almost need an API that registers before the failure that you care about sync failures, and that you plan to call fsync() to gather such information. I am not sure how you would allow more than the first fsync() to see the failure unless you added another API to clear the fsync failure, but I don't see the point since the first fsync() might call that clear function. How many applications are going to know there is another application that cares about the failure? Not many. <blockquote> Currently we kind of hope the kernel will deal with figuring out which is which and retrying. Turns out it doesn't do that so much, and I don't think the reasons for that are wholly unreasonable. We may have been asking too much. </blockquote> Agreed. <blockquote> <blockquote> Our only option might be to tell administrators to closely watch for kernel write failure messages, and then restore or failover. :-( </blockquote> Speaking of, there's not necessarily any lost page write error in the logs AFAICS. My tests often just show "Buffer I/O error on device dm-0, logical block 59393" or the like. </blockquote> I assume that is the kernel logs. I am thinking the kernel logs have to be monitored, but how many administrators do that? The other issue I think you are pointing out is how is the administrator going to know this is a Postgres file? I guess any sync error to a device that contains Postgres has to assume Postgres is corrupted. :-( <hr /> see explicit treatment of retrying, though I'm not entirely sure if the retry flag is set just for async write-back), and apparently unlike every other kernel I've tried to grok so far (things descended from ancestral BSD but not descended from FreeBSD, with macOS/Darwin apparently in the first category for this purpose). Here's a new ticket in the NetBSD bug database for this stuff: <a href="http://gnats.netbsd.org/53152">http://gnats.netbsd.org/53152</a> As mentioned in that ticket and by Andres earlier in this thread, keeping the page dirty isn't the only strategy that would work and may be problematic in different ways (it tells the truth but floods your cache with unflushable stuff until eventually you force unmount it and your buffers are eventually invalidated after ENXIO errors? I don't know.). I have no qualified opinion on that. I just know that we need a way for fsync() to tell the truth about all preceding writes or our checkpoints are busted. *We mmap() + msync() in pg_flush_data() if you don't have sync_file_range(), and I see now that that is probably not a great idea on ZFS because you'll finish up double-buffering (or is that triple-buffering?), flooding your page cache with transient data. Oops. That is off-topic and not relevant for the checkpoint correctness topic of this thread through, since pg_flush_data() is advisory only. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-04 22:14:24 </code></pre> On Thu, Apr 5, 2018 at 9:28 AM, Thomas Munro wrote: <blockquote> On Thu, Apr 5, 2018 at 2:00 AM, Craig Ringer wrote: <blockquote> I've tried xfs, jfs, ext3, ext4, even vfat. All behave the same on EIO. Didn't try zfs-on-linux or other platforms yet. </blockquote> While contemplating what exactly it would do (not sure), </blockquote> See manual for failmode=wait | continue | panic. Even "continue" returns EIO to all new write requests, so they apparently didn't bother to supply an 'eat-my-data-but-tell-me-everything-is-fine' mode. Figures. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-05 07:09:57 </code></pre> Summary to date: It's worse than I thought originally, because: <ul> <li>Most widely deployed kernels have cases where they don't tell you about losing your writes at all; and</li> <li>Information about loss of writes can be masked by closing and re-opening a file</li> </ul> So the checkpointer cannot trust that a successful fsync() means ... a successful fsync(). Also, it's been reported to me off-list that anyone on the system calling sync(2) or the sync shell command will also generally consume the write error, causing us not to see it when we fsync(). The same is true for /proc/sys/vm/drop_caches. I have not tested these yet. There's some level of agreement that we should PANIC on fsync() errors, at least on Linux, but likely everywhere. But we also now know it's insufficient to be fully protective. I previously though that errors=remount-ro was a sufficient safeguard. It isn't. There doesn't seem to be anything that is, for ext3, ext4, btrfs or xfs. It's not clear to me yet why data_err=abort isn't sufficient in data=ordered or data=writeback mode on ext3 or ext4, needs more digging. (In my test tools that's: make FSTYPE=ext4 MKFSOPTS="" MOUNTOPTS="errors=remount-ro, data_err=abort,data=journal" as of the current version d7fe802ec). AFAICS that's because data_error=abort only affects data=ordered, not data=journal. If you use data=ordered, you at least get retries of the same write failing. This post <a href="https://lkml.org/lkml/2008/10/10/80">https://lkml.org/lkml/2008/10/10/80</a> added the option and has some explanation, but doesn't explain why it doesn't affect data=journal. zfs is probably not affected by the issues, per Thomas Munro. I haven't run my test scripts on it yet because my kernel doesn't have zfs support and I'm prioritising the multi-process / open-and-close issues. So far none of the FSes and options I've tried exhibit the behavour I actually want, which is to make the fs readonly or inaccessible on I/O error. ENOSPC doesn't seem to be a concern during normal operation of major file systems (ext3, ext4, btrfs, xfs) because they reserve space before returning from write(). But if a buffered write does manage to fail due to ENOSPC we'll definitely see the same problems. This makes ENOSPC on NFS a potentially data corrupting condition since NFS doesn't preallocate space before returning from write(). I think what we really need is a block-layer fix, where an I/O error flips the block device into read-only mode, as if blockdev --setro had been used. Though I'd settle for a kernel panic, frankly. I don't think anybody really wants this, but I'd rather either of those to silent data loss. I'm currently tweaking my test to do some close and reopen the file between each write() and fsync(), and to support running with nfs. I've also just found the device-mapper "flakey" driver, which looks fantastic for simulating unreliable I/O with intermittent faults. I've been using the "error" target in a mapping, which lets me remap some of the device to always error, but "flakey" looks very handy for actual PostgreSQL testing. For the sake of Google, these are errors known to be associated with the problem: ext4, and ext3 mounted with ext4 driver: <pre><code>[42084.327345] EXT4-fs warning (device dm-0): ext4_end_bio:323: I/O error 10 writing to inode 12 (offset 0 size 0 starting block 59393) [42084.327352] Buffer I/O error on device dm-0, logical block 59393 </code></pre> xfs: <pre><code>[42193.771367] XFS (dm-0): writeback error on sector 118784 [42193.784477] XFS (dm-0): writeback error on sector 118784 </code></pre> jfs: (nil, silence in the kernel logs) You should also beware of "lost page write" or "lost write" errors. <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-05 08:46:08 </code></pre> On 5 April 2018 at 15:09, Craig Ringer wrote: <blockquote> Also, it's been reported to me off-list that anyone on the system calling sync(2) or the sync shell command will also generally consume the write error, causing us not to see it when we fsync(). The same is true for /proc/sys/vm/drop_caches. I have not tested these yet. </blockquote> I just confirmed this with a tweak to the test that <pre><code>records the file position close()s the fd sync()s open(s) the file lseek()s back to the recorded position </code></pre> This causes the test to completely ignore the I/O error, which is not reported to it at any time. Fair enough, really, when you look at it from the kernel's point of view. What else can it do? Nobody has the file open. It'd have to mark the file its self as bad somehow. But that's pretty bad for our robustness AFAICS. <blockquote> There's some level of agreement that we should PANIC on fsync() errors, at least on Linux, but likely everywhere. But we also now know it's insufficient to be fully protective. </blockquote> If dirty writeback fails between our close() and re-open() I see the same behaviour as with sync(). To test that I set dirty_writeback_centisecs and dirty_expire_centisecs to 1 and added a usleep(3*100*1000) between close() and open(). (It's still plenty slow). So sync() is a convenient way to simulate something other than our own fsync() writing out the dirty buffer. If I omit the sync() then we get the error reported by fsync() once when we re open() the file and fsync() it, because the buffers weren't written out yet, so the error wasn't generated until we re-open()ed the file. But I doubt that'll happen much in practice because dirty writeback will get to it first so the error will be seen and discarded before we reopen the file in the checkpointer. In other words, it looks like even with a new kernel with the error reporting bug fixes, if I understand how the backends and checkpointer interact when it comes to file descriptors, we're unlikely to notice I/O errors and fail a checkpoint. We may notice I/O errors if a backend does its own eager writeback for large I/O operations, or if the checkpointer fsync()s a file before the kernel's dirty writeback gets around to trying to flush the pages that will fail. I haven't tested anything with multiple processes / multiple FDs yet, where we keep one fd open while writing on another. But at this point I don't see any way to make Pg reliably detect I/O errors and fail a checkpoint then redo and retry. To even fix this by PANICing like I proposed originally, we need to know we have to PANIC. AFAICS it's completely unsafe to write(), close(), open() and fsync() and expect that the fsync() makes any promises about the write(). Which if I read Pg's low level storage code right, makes it completely unable to reliably detect I/O errors. When put it that way, it sounds fair enough too. How long is the kernel meant to remember that there was a write error on the file triggered by a write initiated by some seemingly unrelated process, some unbounded time ago, on a since-closed file? But it seems to put Pg on the fast track to O_DIRECT. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-05 19:33:14 </code></pre> On Thu, Apr 5, 2018 at 03:09:57PM +0800, Craig Ringer wrote: <blockquote> ENOSPC doesn't seem to be a concern during normal operation of major file systems (ext3, ext4, btrfs, xfs) because they reserve space before returning from write(). But if a buffered write does manage to fail due to ENOSPC we'll definitely see the same problems. This makes ENOSPC on NFS a potentially data corrupting condition since NFS doesn't preallocate space before returning from write(). </blockquote> This does explain why NFS has a reputation for unreliability for Postgres. <hr /> <pre><code>From:Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> Date:2018-04-05 23:37:42 </code></pre> Note: as I've brought up in another thread, it turns out that PG is not handling fsync errors correctly even when the OS does do the right thing (discovered by testing on FreeBSD). <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-06 01:27:05 </code></pre> On 6 April 2018 at 07:37, Andrew Gierth wrote: <blockquote> Note: as I've brought up in another thread, it turns out that PG is not handling fsync errors correctly even when the OS does do the right thing (discovered by testing on FreeBSD). </blockquote> Yikes. For other readers, the related thread for this is Meanwhile, I've extended my test to run postgres on a deliberately faulty volume and confirmed my results there. <pre><code>2018-04-06 01:11:40.555 UTC [58] LOG: checkpoint starting: immediate force wait 2018-04-06 01:11:40.567 UTC [58] ERROR: could not fsync file "base/12992/16386": Input/output error 2018-04-06 01:11:40.655 UTC [66] ERROR: checkpoint request failed 2018-04-06 01:11:40.655 UTC [66] HINT: Consult recent messages in the server log for details. 2018-04-06 01:11:40.655 UTC [66] STATEMENT: CHECKPOINT Checkpoint failed with checkpoint request failed HINT: Consult recent messages in the server log for details. Retrying 2018-04-06 01:11:41.568 UTC [58] LOG: checkpoint starting: immediate force wait 2018-04-06 01:11:41.614 UTC [58] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.000 s, total=0.046 s; sync files=3, longest=0.000 s, average=0.000 s; distance=2727 kB, estimate=2779 kB </code></pre> Given your report, now I have to wonder if we even reissued the fsync() at all this time. 'perf' time. OK, with <pre><code>sudo perf record -e syscalls:sys_enter_fsync,syscalls:sys_exit_fsync -a sudo perf script </code></pre> I see the failed fync, then the same fd being fsync()d without error on the next checkpoint, which succeeds. <pre><code> postgres 9602 [003] 72380.325817: syscalls:sys_enter_fsync: fd: 0x00000005 postgres 9602 [003] 72380.325931: syscalls:sys_exit_fsync: 0xfffffffffffffffb ... postgres 9602 [000] 72381.336767: syscalls:sys_enter_fsync: fd: 0x00000005 postgres 9602 [000] 72381.336840: syscalls:sys_exit_fsync: 0x0 </code></pre> ... and Pg continues merrily on its way without realising it lost data: <pre><code>[72379.834872] XFS (dm-0): writeback error on sector 118752 [72380.324707] XFS (dm-0): writeback error on sector 118688 </code></pre> In this test I set things up so the checkpointer would see the first fsync() error. But if I make checkpoints less frequent, the bgwriter aggressive, and kernel dirty writeback aggressive, it should be possible to have the failure go completely unobserved too. I'll try that next, because we've already largely concluded that the solution to the issue above is to PANIC on fsync() error. But if we don't see the error at all we're in trouble. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-06 02:53:56 </code></pre> On Fri, Apr 6, 2018 at 1:27 PM, Craig Ringer wrote: <blockquote> On 6 April 2018 at 07:37, Andrew Gierth wrote: <blockquote> Note: as I've brought up in another thread, it turns out that PG is not handling fsync errors correctly even when the OS does do the right thing (discovered by testing on FreeBSD). </blockquote> Yikes. For other readers, the related thread for this is </blockquote> Yeah. That's really embarrassing, especially after beating up on various operating systems all week. It's also an independent issue -- let's keep that on the other thread and get it fixed. <blockquote> I see the failed fync, then the same fd being fsync()d without error on the next checkpoint, which succeeds. <pre><code> postgres 9602 [003] 72380.325817: syscalls:sys_enter_fsync: fd: </code></pre> 0x00000005 postgres 9602 [003] 72380.325931: syscalls:sys_exit_fsync: 0xfffffffffffffffb ... postgres 9602 [000] 72381.336767: syscalls:sys_enter_fsync: fd: 0x00000005 postgres 9602 [000] 72381.336840: syscalls:sys_exit_fsync: 0x0 ... and Pg continues merrily on its way without realising it lost data: [72379.834872] XFS (dm-0): writeback error on sector 118752 [72380.324707] XFS (dm-0): writeback error on sector 118688 In this test I set things up so the checkpointer would see the first fsync() error. But if I make checkpoints less frequent, the bgwriter aggressive, and kernel dirty writeback aggressive, it should be possible to have the failure go completely unobserved too. I'll try that next, because we've already largely concluded that the solution to the issue above is to PANIC on fsync() error. But if we don't see the error at all we're in trouble. </blockquote> I suppose you only see errors because the file descriptors linger open in the virtual file descriptor cache, which is a matter of luck depending on how many relation segment files you touched. One thing you could try to confirm our understand of the Linux 4.13+ policy would be to hack PostgreSQL so that it reopens the file descriptor every time in mdsync(). See attached. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-06 03:20:22 </code></pre> On 6 April 2018 at 10:53, Thomas Munro wrote: <blockquote> On Fri, Apr 6, 2018 at 1:27 PM, Craig Ringer wrote: <blockquote> On 6 April 2018 at 07:37, Andrew Gierth wrote: <blockquote> Note: as I've brought up in another thread, it turns out that PG is not handling fsync errors correctly even when the OS does do the right thing (discovered by testing on FreeBSD). </blockquote> Yikes. For other readers, the related thread for this is news-spur.riddles.org.uk </blockquote> Yeah. That's really embarrassing, especially after beating up on various operating systems all week. It's also an independent issue -- let's keep that on the other thread and get it fixed. <blockquote> I see the failed fync, then the same fd being fsync()d without error on the next checkpoint, which succeeds. <pre><code> postgres 9602 [003] 72380.325817: syscalls:sys_enter_fsync: fd: </code></pre> 0x00000005 postgres 9602 [003] 72380.325931: syscalls:sys_exit_fsync: 0xfffffffffffffffb ... postgres 9602 [000] 72381.336767: syscalls:sys_enter_fsync: fd: 0x00000005 postgres 9602 [000] 72381.336840: syscalls:sys_exit_fsync: 0x0 ... and Pg continues merrily on its way without realising it lost data: [72379.834872] XFS (dm-0): writeback error on sector 118752 [72380.324707] XFS (dm-0): writeback error on sector 118688 In this test I set things up so the checkpointer would see the first fsync() error. But if I make checkpoints less frequent, the bgwriter aggressive, and kernel dirty writeback aggressive, it should be possible to have the failure go completely unobserved too. I'll try that next, because we've already largely concluded that the solution to the issue above is to PANIC on fsync() error. But if we don't see the error at all we're in trouble. </blockquote> I suppose you only see errors because the file descriptors linger open in the virtual file descriptor cache, which is a matter of luck depending on how many relation segment files you touched. </blockquote> In this case I think it's because the kernel didn't get around to doing the writeback before the eagerly forced checkpoint fsync()'d it. Or we didn't even queue it for writeback from our own shared_buffers until just before we fsync()'d it. After all, it's a contrived test case that tries to reproduce the issue rapidly with big writes and frequent checkpoints. So the checkpointer had the relation open to fsync() it, and it was the checkpointer's fsync() that did writeback on the dirty page and noticed the error. If we the kernel had done the writeback before the checkpointer opened the relation to fsync() it, we might not have seen the error at all - though as you note this depends on the file descriptor cache. You can see the silent-error behaviour in my standalone test case where I confirmed the post-4.13 behaviour. (I'm on 4.14 here). I can try to reproduce it with postgres too, but it not only requires closing and reopening the FDs, it also requires forcing writeback before opening the fd. To make it occur in a practical timeframe I have to make my kernel writeback settings insanely aggressive and/or call sync() before re-open()ing. I don't really think it's worth it, since I've confirmed the behaviour already with the simpler test in standalone/ in the rest repo. To try it yourself, clone <a href="https://github.com/ringerc/scrapcode">https://github.com/ringerc/scrapcode</a> and in the master branch <pre><code>cd testcases/fsync-error-clear less README make REOPEN=reopen standalone-run </code></pre> See <a href="https://github.com/ringerc/scrapcode/blob/master/testcases/fsync-error-clear/standalone/fsync-error-clear.c#L118">https://github.com/ringerc/scrapcode/blob/master/testcases/fsync-error-clear/standalone/fsync-error-clear.c#L118</a> . I've pushed the postgres test to that repo too; "make postgres-run". You'll need docker, and be warned, it's using privileged docker containers and messing with dmsetup. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-08 02:16:07 </code></pre> So, what can we actually do about this new Linux behaviour? Idea 1: <ul> <li>whenever you open a file, either tell the checkpointer so it can open it too (and wait for it to tell you that it has done so, because it's not safe to write() until then), or send it a copy of the file descriptor via IPC (since duplicated file descriptors share the same f_wb_err)</li> <li>if the checkpointer can't take any more file descriptors (how would that limit even work in the IPC case?), then it somehow needs to tell you that so that you know that you're responsible for fsyncing that file yourself, both on close (due to fd cache recycling) and also when the checkpointer tells you to</li> </ul> Maybe it could be made to work, but sheesh, that seems horrible. Is there some simpler idea along these lines that could make sure that fsync() is only ever called on file descriptors that were opened before all unflushed writes, or file descriptors cloned from such file descriptors? Idea 2: Give up, complain that this implementation is defective and unworkable, both on POSIX-compliance grounds and on POLA grounds, and campaign to get it fixed more fundamentally (actual details left to the experts, no point in speculating here, but we've seen a few approaches that work on other operating systems including keeping buffers dirty and marking the whole filesystem broken/read-only). Idea 3: Give up on buffered IO and develop an O_SYNC | O_DIRECT based system ASAP. Any other ideas? For a while I considered suggesting an idea which I now think doesn't work. I thought we could try asking for a new fcntl interface that spits out wb_err counter. Call it an opaque error token or something. Then we could store it in our fsync queue and safely close the file. Check again before fsync()ing, and if we ever see a different value, PANIC because it means a writeback error happened while we weren't looking. Sadly I think it doesn't work because AIUI inodes are not pinned in kernel memory when no one has the file open and there are no dirty buffers, so I think the counters could go away and be reset. Perhaps you could keep inodes pinned by keeping the associated buffers dirty after an error (like FreeBSD), but if you did that you'd have solved the problem already and wouldn't really need the wb_err system at all. Is there some other idea long these lines that could work? <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-08 02:33:37 </code></pre> On Sun, Apr 8, 2018 at 02:16:07PM +1200, Thomas Munro wrote: <blockquote> So, what can we actually do about this new Linux behaviour? Idea 1: <ul> <li>whenever you open a file, either tell the checkpointer so it can open it too (and wait for it to tell you that it has done so, because it's not safe to write() until then), or send it a copy of the file descriptor via IPC (since duplicated file descriptors share the same f_wb_err)</li> <li>if the checkpointer can't take any more file descriptors (how would that limit even work in the IPC case?), then it somehow needs to tell you that so that you know that you're responsible for fsyncing that file yourself, both on close (due to fd cache recycling) and also when the checkpointer tells you to</li> </ul> Maybe it could be made to work, but sheesh, that seems horrible. Is there some simpler idea along these lines that could make sure that fsync() is only ever called on file descriptors that were opened before all unflushed writes, or file descriptors cloned from such file descriptors? Idea 2: Give up, complain that this implementation is defective and unworkable, both on POSIX-compliance grounds and on POLA grounds, and campaign to get it fixed more fundamentally (actual details left to the experts, no point in speculating here, but we've seen a few approaches that work on other operating systems including keeping buffers dirty and marking the whole filesystem broken/read-only). Idea 3: Give up on buffered IO and develop an O_SYNC | O_DIRECT based system ASAP. </blockquote> Idea 4 would be for people to assume their database is corrupt if their server logs report any I/O error on the file systems Postgres uses. <hr /> <pre><code>From:Christophe Pettus <xof(at)thebuild(dot)com> Date:2018-04-08 02:37:47 </code></pre> <blockquote> On Apr 7, 2018, at 19:33, Bruce Momjian wrote: Idea 4 would be for people to assume their database is corrupt if their server logs report any I/O error on the file systems Postgres uses. </blockquote> Pragmatically, that's where we are right now. The best answer in this bad situation is (a) fix the error, then (b) replay from a checkpoint before the error occurred, but it appears we can't even guarantee that a PostgreSQL process will be the one to see the error. -- -- Christophe Pettus xof(at)thebuild(dot)com <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-08 03:27:45 </code></pre> On 8 April 2018 at 10:16, Thomas Munro wrote: <blockquote> So, what can we actually do about this new Linux behaviour? </blockquote> Yeah, I've been cooking over that myself. More below, but here's an idea #5: decide InnoDB has the right idea, and go to using a single massive blob file, or a few giant blobs. We have a storage abstraction that makes this way, way less painful than it should be. We can virtualize relfilenodes into storage extents in relatively few big files. We could use sparse regions to make the addressing more convenient, but that makes copying and backup painful, so I'd rather not. Even one file per tablespace for persistent relation heaps, another for indexes, another for each fork type. That way we can use something like your #1 (which is what I was also thinking about then rejecting previously), but reduce the pain by reducing the FD count drastically so exhausting FDs stops being a problem. Previously I was leaning toward what you've described here: <blockquote> <ul> <li>whenever you open a file, either tell the checkpointer so it can open it too (and wait for it to tell you that it has done so, because it's not safe to write() until then), or send it a copy of the file descriptor via IPC (since duplicated file descriptors share the same f_wb_err)</li> <li>if the checkpointer can't take any more file descriptors (how would that limit even work in the IPC case?), then it somehow needs to tell you that so that you know that you're responsible for fsyncing that file yourself, both on close (due to fd cache recycling) and also when the checkpointer tells you to</li> </ul> Maybe it could be made to work, but sheesh, that seems horrible. Is there some simpler idea along these lines that could make sure that fsync() is only ever called on file descriptors that were opened before all unflushed writes, or file descriptors cloned from such file descriptors? </blockquote> ... and got stuck on "yuck, that's awful". I was assuming we'd force early checkpoints if the checkpointer hit its fd limit, but that's even worse. We'd need to urgently do away with segmented relations, and partitions would start to become a hinderance. Even then it's going to be an unworkable nightmare with heavily partitioned systems, systems that use schema-sharding, etc. And it'll mean we need to play with process limits and, often, system wide limits on FDs. I imagine the performance implications won't be pretty. Idea 2: <blockquote> Give up, complain that this implementation is defective and unworkable, both on POSIX-compliance grounds and on POLA grounds, and campaign to get it fixed more fundamentally (actual details left to the experts, no point in speculating here, but we've seen a few approaches that work on other operating systems including keeping buffers dirty and marking the whole filesystem broken/read-only). </blockquote> This appears to be what SQLite does AFAICS. <a href="https://www.sqlite.org/atomiccommit.html">https://www.sqlite.org/atomiccommit.html</a> though it has the huge luxury of a single writer, so it's probably only subject to the original issue not the multiprocess / checkpointer issues we face. <blockquote> Idea 3: Give up on buffered IO and develop an O_SYNC | O_DIRECT based system ASAP. </blockquote> That seems to be what the kernel folks will expect. But that's going to KILL performance. We'll need writer threads to have any hope of it not totally sucking, because otherwise simple things like updating a heap tuple and two related indexes will incur enormous disk latencies. But I suspect it's the path forward. Goody. <blockquote> Any other ideas? For a while I considered suggesting an idea which I now think doesn't work. I thought we could try asking for a new fcntl interface that spits out wb_err counter. Call it an opaque error token or something. Then we could store it in our fsync queue and safely close the file. Check again before fsync()ing, and if we ever see a different value, PANIC because it means a writeback error happened while we weren't looking. Sadly I think it doesn't work because AIUI inodes are not pinned in kernel memory when no one has the file open and there are no dirty buffers, so I think the counters could go away and be reset. Perhaps you could keep inodes pinned by keeping the associated buffers dirty after an error (like FreeBSD), but if you did that you'd have solved the problem already and wouldn't really need the wb_err system at all. Is there some other idea long these lines that could work? </blockquote> I think our underlying data syncing concept is fundamentally broken, and it's not really the kernel's fault. We assume that we can safely: <pre><code>procA: open() procA: write() procA: close() </code></pre> ... some long time later, unbounded as far as the kernel is concerned ... <pre><code>procB: open() procB: fsync() procB: close() </code></pre> If the kernel does writeback in the middle, how on earth is it supposed to know we expect to reopen the file and check back later? Should it just remember "this file had an error" forever, and tell every caller? In that case how could we recover? We'd need some new API to say "yeah, ok already, I'm redoing all my work since the last good fsync() so you can clear the error flag now". Otherwise it'd keep reporting an error after we did redo to recover, too. I never really clicked to the fact that we closed relations with pending buffered writes, left them closed, then reopened them to fsync. That's .... well, the kernel isn't the only thing doing crazy things here. Right now I think we're at option (4): If you see anything that smells like a write error in your kernel logs, hard-kill postgres with -m immediate (do NOT let it do a shutdown checkpoint). If it did a checkpoint since the logs, fake up a backup label to force redo to start from the last checkpoint before the error. Otherwise, it's safe to just let it start up again and do redo again. Fun times. This also means AFAICS that running Pg on NFS is extremely unsafe, you MUST make sure you don't run out of disk. Because the usual safeguard of space reservation against ENOSPC in fsync doesn't apply to NFS. (I haven't tested this with nfsv3 in sync,hard,nointr mode yet, maybe that's safe, but I doubt it). The same applies to thin-provisioned storage. Just. Don't. This helps explain various reports of corruption in Docker and various other tools that use various sorts of thin provisioning. If you hit ENOSPC in fsync(), bye bye data. <hr /> <pre><code>From:Peter Geoghegan <pg(at)bowt(dot)ie> Date:2018-04-08 03:37:06 </code></pre> On Sat, Apr 7, 2018 at 8:27 PM, Craig Ringer wrote: <blockquote> More below, but here's an idea #5: decide InnoDB has the right idea, and go to using a single massive blob file, or a few giant blobs. We have a storage abstraction that makes this way, way less painful than it should be. We can virtualize relfilenodes into storage extents in relatively few big files. We could use sparse regions to make the addressing more convenient, but that makes copying and backup painful, so I'd rather not. Even one file per tablespace for persistent relation heaps, another for indexes, another for each fork type. </blockquote> I'm not sure that we can do that now, since it would break the new "Optimize btree insertions for common case of increasing values" optimization. (I did mention this before it went in.) I've asked Pavan to at least add a note to the nbtree README that explains the high level theory behind the optimization, as part of post-commit clean-up. I'll ask him to say something about how it might affect extent-based storage, too. <hr /> <pre><code>From:Christophe Pettus <xof(at)thebuild(dot)com> Date:2018-04-08 03:46:17 </code></pre> <blockquote> On Apr 7, 2018, at 20:27, Craig Ringer wrote: Right now I think we're at option (4): If you see anything that smells like a write error in your kernel logs, hard-kill postgres with -m immediate (do NOT let it do a shutdown checkpoint). If it did a checkpoint since the logs, fake up a backup label to force redo to start from the last checkpoint before the error. Otherwise, it's safe to just let it start up again and do redo again. </blockquote> Before we spiral down into despair and excessive alcohol consumption, this is basically the same situation as a checksum failure or some other kind of uncorrected media-level error. The bad part is that we have to find out from the kernel logs rather than from PostgreSQL directly. But this does not strike me as otherwise significantly different from, say, an infrequently-accessed disk block reporting an uncorrectable error when we finally get around to reading it. <hr /> <pre><code>From:Andreas Karlsson <andreas(at)proxel(dot)se> Date:2018-04-08 09:41:06 </code></pre> On 04/08/2018 05:27 AM, Craig Ringer wrote:> <blockquote> More below, but here's an idea #5: decide InnoDB has the right idea, and go to using a single massive blob file, or a few giant blobs. </blockquote> FYI: MySQL has by default one file per table these days. The old approach with one massive file was a maintenance headache so they change the default some releases ago. <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-multiple-tablespaces.html">https://dev.mysql.com/doc/refman/8.0/en/innodb-multiple-tablespaces.html</a> <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-08 10:30:31 </code></pre> On 8 April 2018 at 11:46, Christophe Pettus wrote: <blockquote> On Apr 7, 2018, at 20:27, Craig Ringer wrote: Right now I think we're at option (4): If you see anything that smells like a write error in your kernel logs, hard-kill postgres with -m immediate (do NOT let it do a shutdown checkpoint). If it did a checkpoint since the logs, fake up a backup label to force redo to start from the last checkpoint before the error. Otherwise, it's safe to just let it start up again and do redo again. Before we spiral down into despair and excessive alcohol consumption, this is basically the same situation as a checksum failure or some other kind of uncorrected media-level error. The bad part is that we have to find out from the kernel logs rather than from PostgreSQL directly. But this does not strike me as otherwise significantly different from, say, an infrequently-accessed disk block reporting an uncorrectable error when we finally get around to reading it. </blockquote> I don't entirely agree - because it affects ENOSPC, I/O errors on thin provisioned storage, I/O errors on multipath storage, etc. (I identified the original issue on a thin provisioned system that ran out of backing space, mangling PostgreSQL in a way that made no sense at the time). These are way more likely than bit flips or other storage level corruption, and things that we previously expected to detect and fail gracefully for. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-08 10:31:24 </code></pre> On 8 April 2018 at 17:41, Andreas Karlsson wrote: <blockquote> On 04/08/2018 05:27 AM, Craig Ringer wrote:> More below, but here's an idea #5: decide InnoDB has the right idea, and <blockquote> go to using a single massive blob file, or a few giant blobs. </blockquote> FYI: MySQL has by default one file per table these days. The old approach with one massive file was a maintenance headache so they change the default some releases ago. <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-multiple-tablespaces.html">https://dev.mysql.com/doc/refman/8.0/en/innodb-multiple-tablespaces.html</a> </blockquote> Huh, thanks for the update. We should see how they handle reliable flushing and see if they've looked into it. If they haven't, we should give them a heads-up and if they have, lets learn from them. <hr /> <pre><code>From:Christophe Pettus <xof(at)thebuild(dot)com> Date:2018-04-08 16:38:03 </code></pre> <blockquote> On Apr 8, 2018, at 03:30, Craig Ringer wrote: These are way more likely than bit flips or other storage level corruption, and things that we previously expected to detect and fail gracefully for. </blockquote> This is definitely bad, and it explains a few otherwise-inexplicable corruption issues we've seen. (And great work tracking it down!) I think it's important not to panic, though; PostgreSQL doesn't have a reputation for horrible data integrity. I'm not sure it makes sense to do a major rearchitecting of the storage layer (especially with pluggable storage coming along) to address this. While the failure modes are more common, the solution (a PITR backup) is one that an installation should have anyway against media failures. <hr /> <pre><code>From:Greg Stark <stark(at)mit(dot)edu> Date:2018-04-08 21:23:21 </code></pre> On 8 April 2018 at 04:27, Craig Ringer wrote: <blockquote> On 8 April 2018 at 10:16, Thomas Munro wrote: If the kernel does writeback in the middle, how on earth is it supposed to know we expect to reopen the file and check back later? Should it just remember "this file had an error" forever, and tell every caller? In that case how could we recover? We'd need some new API to say "yeah, ok already, I'm redoing all my work since the last good fsync() so you can clear the error flag now". Otherwise it'd keep reporting an error after we did redo to recover, too. </blockquote> There is no spoon^H^H^H^H^Herror flag. We don't need fsync to keep track of any errors. We just need fsync to accurately report whether all the buffers in the file have been written out. When you call fsync again the kernel needs to initiate i/o on all the dirty buffers and block until they complete successfully. If they complete successfully then nobody cares whether they had some failure in the past when i/o was initiated at some point in the past. The problem is not that errors aren't been tracked correctly. The problem is that dirty buffers are being marked clean when they haven't been written out. They consider dirty filesystem buffers when there's hardware failure preventing them from being written "a memory leak". As long as any error means the kernel has discarded writes then there's no real hope of any reliable operation through that interface. Going to DIRECTIO is basically recognizing this. That the kernel filesystem buffer provides no reliable interface so we need to reimplement it ourselves in user space. It's rather disheartening. Aside from having to do all that work we have the added barrier that we don't have as much information about the hardware as the kernel has. We don't know where raid stripes begin and end, how big the memory controller buffers are or how to tell when they're full or empty or how to flush them. etc etc. We also don't know what else is going on on the machine. <hr /> <pre><code>From:Christophe Pettus <xof(at)thebuild(dot)com> Date:2018-04-08 21:28:43 </code></pre> <blockquote> On Apr 8, 2018, at 14:23, Greg Stark wrote: They consider dirty filesystem buffers when there's hardware failure preventing them from being written "a memory leak". </blockquote> That's not an irrational position. File system buffers are not dedicated memory for file system caching; they're being used for that because no one has a better use for them at that moment. If an inability to flush them to disk meant that they suddenly became pinned memory, a large copy operation to a yanked USB drive could result in the system having no more allocatable memory. I guess in theory that they could swap them, but swapping out a file system buffer in hopes that sometime in the future it could be properly written doesn't seem very architecturally sound to me. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-08 21:47:04 </code></pre> On Sun, Apr 08, 2018 at 10:23:21PM +0100, Greg Stark wrote: <blockquote> On 8 April 2018 at 04:27, Craig Ringer wrote: <blockquote> On 8 April 2018 at 10:16, Thomas Munro wrote: If the kernel does writeback in the middle, how on earth is it supposed to know we expect to reopen the file and check back later? Should it just remember "this file had an error" forever, and tell every caller? In that case how could we recover? We'd need some new API to say "yeah, ok already, I'm redoing all my work since the last good fsync() so you can clear the error flag now". Otherwise it'd keep reporting an error after we did redo to recover, too. </blockquote> There is no spoon^H^H^H^H^Herror flag. We don't need fsync to keep track of any errors. We just need fsync to accurately report whether all the buffers in the file have been written out. When you call fsync </blockquote> Instead, fsync() reports when some of the buffers have not been written out, due to reasons outlined before. As such it may make some sense to maintain some tracking regarding errors even after marking failed dirty pages as clean (in fact it has been proposed, but this introduces memory overhead). <blockquote> again the kernel needs to initiate i/o on all the dirty buffers and block until they complete successfully. If they complete successfully then nobody cares whether they had some failure in the past when i/o was initiated at some point in the past. </blockquote> The question is, what should the kernel and application do in cases where this is simply not possible (according to freebsd that keeps dirty pages around after failure, for example, -EIO from the block layer is a contract for unrecoverable errors so it is pointless to keep them dirty). You'd need a specialized interface to clear-out the errors (and drop the dirty pages), or potentially just remount the filesystem. <blockquote> The problem is not that errors aren't been tracked correctly. The problem is that dirty buffers are being marked clean when they haven't been written out. They consider dirty filesystem buffers when there's hardware failure preventing them from being written "a memory leak". As long as any error means the kernel has discarded writes then there's no real hope of any reliable operation through that interface. </blockquote> This does not necessarily follow. Whether the kernel discards writes or not would not really help (see above). It is more a matter of proper "reporting contract" between userspace and kernel, and tracking would be a way for facilitating this vs. having a more complex userspace scheme (as described by others in this thread) where synchronization for fsync() is required in a multi-process application. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-08 22:29:16 </code></pre> On Sun, Apr 8, 2018 at 09:38:03AM -0700, Christophe Pettus wrote: <blockquote> <blockquote> On Apr 8, 2018, at 03:30, Craig Ringer wrote: These are way more likely than bit flips or other storage level corruption, and things that we previously expected to detect and fail gracefully for. </blockquote> This is definitely bad, and it explains a few otherwise-inexplicable corruption issues we've seen. (And great work tracking it down!) I think it's important not to panic, though; PostgreSQL doesn't have a reputation for horrible data integrity. I'm not sure it makes sense to do a major rearchitecting of the storage layer (especially with pluggable storage coming along) to address this. While the failure modes are more common, the solution (a PITR backup) is one that an installation should have anyway against media failures. </blockquote> I think the big problem is that we don't have any way of stopping Postgres at the time the kernel reports the errors to the kernel log, so we are then returning potentially incorrect results and committing transactions that might be wrong or lost. If we could stop Postgres when such errors happen, at least the administrator could fix the problem of fail-over to a standby. An crazy idea would be to have a daemon that checks the logs and stops Postgres when it seems something wrong. <hr /> <pre><code>From:Christophe Pettus <xof(at)thebuild(dot)com> Date:2018-04-08 23:10:24 </code></pre> <blockquote> On Apr 8, 2018, at 15:29, Bruce Momjian wrote: I think the big problem is that we don't have any way of stopping Postgres at the time the kernel reports the errors to the kernel log, so we are then returning potentially incorrect results and committing transactions that might be wrong or lost. </blockquote> Yeah, it's bad. In the short term, the best advice to installations is to monitor their kernel logs for errors (which very few do right now), and make sure they have a backup strategy which can encompass restoring from an error like this. Even Craig's smart fix of patching the backup label to recover from a previous checkpoint doesn't do much good if we don't have WAL records back that far (or one of the required WAL records also took a hit). In the longer term... O_DIRECT seems like the most plausible way out of this, but that might be popular with people running on file systems or OSes that don't have this issue. (Setting aside the daunting prospect of implementing that.) <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-08 23:16:25 </code></pre> On 2018-04-08 18:29:16 -0400, Bruce Momjian wrote: <blockquote> On Sun, Apr 8, 2018 at 09:38:03AM -0700, Christophe Pettus wrote: <blockquote> <blockquote> On Apr 8, 2018, at 03:30, Craig Ringer wrote: These are way more likely than bit flips or other storage level corruption, and things that we previously expected to detect and fail gracefully for. </blockquote> This is definitely bad, and it explains a few otherwise-inexplicable corruption issues we've seen. (And great work tracking it down!) I think it's important not to panic, though; PostgreSQL doesn't have a reputation for horrible data integrity. I'm not sure it makes sense to do a major rearchitecting of the storage layer (especially with pluggable storage coming along) to address this. While the failure modes are more common, the solution (a PITR backup) is one that an installation should have anyway against media failures. </blockquote> I think the big problem is that we don't have any way of stopping Postgres at the time the kernel reports the errors to the kernel log, so we are then returning potentially incorrect results and committing transactions that might be wrong or lost. If we could stop Postgres when such errors happen, at least the administrator could fix the problem of fail-over to a standby. An crazy idea would be to have a daemon that checks the logs and stops Postgres when it seems something wrong. </blockquote> I think the danger presented here is far smaller than some of the statements in this thread might make one think. In all likelihood, once you've got an IO error that kernel level retries don't fix, your database is screwed. Whether fsync reports that or not is really somewhat besides the point. We don't panic that way when getting IO errors during reads either, and they're more likely to be persistent than errors during writes (because remapping on storage layer can fix issues, but not during reads). There's a lot of not so great things here, but I don't think there's any need to panic. We should fix things so that reported errors are treated with crash recovery, and for the rest I think there's very fair arguments to be made that that's far outside postgres's remit. I think there's pretty good reasons to go to direct IO where supported, but error handling doesn't strike me as a particularly good reason for the move. <hr /> <pre><code>From:Christophe Pettus <xof(at)thebuild(dot)com> Date:2018-04-08 23:27:57 </code></pre> <blockquote> On Apr 8, 2018, at 16:16, Andres Freund wrote: We don't panic that way when getting IO errors during reads either, and they're more likely to be persistent than errors during writes (because remapping on storage layer can fix issues, but not during reads). </blockquote> There is a distinction to be drawn there, though, because we immediately pass an error back to the client on a read, but a write problem in this situation can be masked for an extended period of time. That being said... <blockquote> There's a lot of not so great things here, but I don't think there's any need to panic. </blockquote> No reason to panic, yes. We can assume that if this was a very big persistent problem, it would be much more widely reported. It would, however, be good to find a way to get the error surfaced back up to the client in a way that is not just monitoring the kernel logs. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-09 01:31:56 </code></pre> On 9 April 2018 at 05:28, Christophe Pettus wrote: <blockquote> <blockquote> On Apr 8, 2018, at 14:23, Greg Stark wrote: They consider dirty filesystem buffers when there's hardware failure preventing them from being written "a memory leak". </blockquote> That's not an irrational position. File system buffers are not dedicated memory for file system caching; they're being used for that because no one has a better use for them at that moment. If an inability to flush them to disk meant that they suddenly became pinned memory, a large copy operation to a yanked USB drive could result in the system having no more allocatable memory. I guess in theory that they could swap them, but swapping out a file system buffer in hopes that sometime in the future it could be properly written doesn't seem very architecturally sound to me. </blockquote> Yep. Another example is a write to an NFS or iSCSI volume that goes away forever. What if the app keeps write()ing in the hopes it'll come back, and by the time the kernel starts reporting EIO for write(), it's already saddled with a huge volume of dirty writeback buffers it can't get rid of because someone, one day, might want to know about them? You could make the argument that it's OK to forget if the entire file system goes away. But actually, why is that ok? What if it's remounted again? That'd be really bad too, for someone expecting write reliability. You can coarsen from dirty buffer tracking to marking the FD(s) bad, but what if there's no FD to mark because the file isn't open at the moment? You can mark the inode cache entry and pin it, I guess. But what if your app triggered I/O errors over vast numbers of small files? Again, the kernel's left holding the ball. It doesn't know if/when an app will return to check. It doesn't know how long to remember the failure for. It doesn't know when all interested clients have been informed and it can treat the fault as cleared/repaired, either, so it'd have to keep on reporting EIO for PostgreSQL's own writes and fsyncs() indefinitely, even once we do recovery. The only way it could avoid that would be to keep the dirty writeback pages around and flagged bad, then clear the flag when a new write() replaces the same file range. I can't imagine that being practical. Blaming the kernel for this sure is the easy way out. But IMO we cannot rationally expect the kernel to remember error state forever for us, then forget it when we expect, all without actually telling it anything about our activities or even that we still exist and are still interested in the files/writes. We've closed the files and gone away. Whatever we do, it's likely going to have to involve not doing that anymore. Even if we can somehow convince the kernel folks to add a new interface for us that reports I/O errors to some listener, like an inotify/fnotify/dnotify/whatever-it-is-today-notify extension reporting errors in buffered async writes, we won't be able to rely on having it for 5-10 years, and only on Linux. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-09 01:35:06 </code></pre> On 9 April 2018 at 06:29, Bruce Momjian wrote: <blockquote> I think the big problem is that we don't have any way of stopping Postgres at the time the kernel reports the errors to the kernel log, so we are then returning potentially incorrect results and committing transactions that might be wrong or lost. </blockquote> Right. Specifically, we need a way to ask the kernel at checkpoint time "was everything written to [this set of files] flushed successfully since the last time I asked, no matter who did the writing and no matter how the writes were flushed?" If the result is "no" we PANIC and redo. If the hardware/volume is screwed, the user can fail over to a standby, do PITR, etc. But we don't have any way to ask that reliably at present. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 01:55:10 </code></pre> Hi, On 2018-04-08 16:27:57 -0700, Christophe Pettus wrote: <blockquote> <blockquote> On Apr 8, 2018, at 16:16, Andres Freund wrote: We don't panic that way when getting IO errors during reads either, and they're more likely to be persistent than errors during writes (because remapping on storage layer can fix issues, but not during reads). </blockquote> There is a distinction to be drawn there, though, because we immediately pass an error back to the client on a read, but a write problem in this situation can be masked for an extended period of time. </blockquote> Only if you're "lucky" enough that your clients actually read that data, and then you're somehow able to figure out across the whole stack that these 0.001% of transactions that fail are due to IO errors. Or you also need to do log analysis. If you want to solve things like that you need regular reads of all your data, including verifications etc. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-09 02:00:41 </code></pre> On 9 April 2018 at 07:16, Andres Freund wrote: <blockquote> I think the danger presented here is far smaller than some of the statements in this thread might make one think. </blockquote> Clearly it's not happening a huge amount or we'd have a lot of noise about Pg eating people's data, people shouting about how unreliable it is, etc. We don't. So it's not some earth shattering imminent threat to everyone's data. It's gone unnoticed, or the root cause unidentified, for a long time. I suspect we've written off a fair few issues in the past as "it'd bad hardware" when actually, the hardware fault was the trigger for a Pg/kernel interaction bug. And blamed containers for things that weren't really the container's fault. But even so, if it were happening tons, we'd hear more noise. I've already been very surprised there when I learned that PostgreSQL completely ignores wholly absent relfilenodes. Specifically, if you unlink() a relation's backing relfilenode while Pg is down and that file has writes pending in the WAL. We merrily re-create it with uninitalized pages and go on our way. As Andres pointed out in an offlist discussion, redo isn't a consistency check, and it's not obliged to fail in such cases. We can say "well, don't do that then" and define away file losses from FS corruption etc as not our problem, the lower levels we expect to take care of this have failed. We have to look at what checkpoints are and are not supposed to promise, and whether this is a problem we just define away as "not our problem, the lower level failed, we're not obliged to detect this and fail gracefully." We can choose to say that checkpoints are required to guarantee crash/power loss safety ONLY and do not attempt to protect against I/O errors of any sort. In fact, I think we should likely amend the documentation for release versions to say just that. <blockquote> In all likelihood, once you've got an IO error that kernel level retries don't fix, your database is screwed. </blockquote> Your database is going to be down or have interrupted service. It's possible you may have some unreadable data. This could result in localised damage to one or more relations. That could affect FK relationships, indexes, all sorts. If you're really unlucky you might lose something critical like pg_clog/ contents. But in general your DB should be repairable/recoverable even in those cases. And in many failure modes there's no reason to expect any data loss at all, like: <ul> <li>Local disk fills up (seems to be safe already due to space reservation at write() time)</li> <li>Thin-provisioned storage backing local volume iSCSI or paravirt block device fills up</li> <li>NFS volume fills up</li> <li>Multipath I/O error</li> <li>Interruption of connectivity to network block device</li> <li>Disk develops localized bad sector where we haven't previously written data</li> </ul> Except for the ENOSPC on NFS, all the rest of the cases can be handled by expecting the kernel to retry forever and not return until the block is written or we reach the heat death of the universe. And NFS, well... Part of the trouble is that the kernel won't retry forever in all these cases, and doesn't seem to have a way to ask it to in all cases. And if the user hasn't configured it for the right behaviour in terms of I/O error resilience, we don't find out about it. So it's not the end of the world, but it'd sure be nice to fix. <blockquote> Whether fsync reports that or not is really somewhat besides the point. We don't panic that way when getting IO errors during reads either, and they're more likely to be persistent than errors during writes (because remapping on storage layer can fix issues, but not during reads). </blockquote> That's because reads don't make promises about what's committed and synced. I think that's quite different. <blockquote> We should fix things so that reported errors are treated with crash recovery, and for the rest I think there's very fair arguments to be made that that's far outside postgres's remit. </blockquote> Certainly for current versions. I think we need to think about a more robust path in future. But it's certainly not "stop the world" territory. The docs need an update to indicate that we explicitly disclaim responsibility for I/O errors on async writes, and that the kernel and I/O stack must be configured never to give up on buffered writes. If it does, that's not our problem anymore. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 02:06:12 </code></pre> On 2018-04-09 10:00:41 +0800, Craig Ringer wrote: <blockquote> I suspect we've written off a fair few issues in the past as "it'd bad hardware" when actually, the hardware fault was the trigger for a Pg/kernel interaction bug. And blamed containers for things that weren't really the container's fault. But even so, if it were happening tons, we'd hear more noise. </blockquote> Agreed on that, but I think that's FAR more likely to be things like multixacts, index structure corruption due to logic bugs etc. <blockquote> I've already been very surprised there when I learned that PostgreSQL completely ignores wholly absent relfilenodes. Specifically, if you unlink() a relation's backing relfilenode while Pg is down and that file has writes pending in the WAL. We merrily re-create it with uninitalized pages and go on our way. As Andres pointed out in an offlist discussion, redo isn't a consistency check, and it's not obliged to fail in such cases. We can say "well, don't do that then" and define away file losses from FS corruption etc as not our problem, the lower levels we expect to take care of this have failed. </blockquote> And it'd be a realy bad idea to behave differently. <blockquote> And in many failure modes there's no reason to expect any data loss at all, like: <ul> <li>Local disk fills up (seems to be safe already due to space reservation at write() time)</li> </ul> </blockquote> That definitely should be treated separately. <blockquote> <ul> <li>Thin-provisioned storage backing local volume iSCSI or paravirt block device fills up</li> <li>NFS volume fills up</li> </ul> </blockquote> Those should be the same as the above. <blockquote> I think we need to think about a more robust path in future. But it's certainly not "stop the world" territory. </blockquote> I think you're underestimating the complexity of doing that by at least two orders of magnitude. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-09 03:15:01 </code></pre> On 9 April 2018 at 10:06, Andres Freund wrote: <blockquote> <blockquote> And in many failure modes there's no reason to expect any data loss at all, like: <ul> <li>Local disk fills up (seems to be safe already due to space reservation at write() time)</li> </ul> </blockquote> That definitely should be treated separately. </blockquote> It is, because all the FSes I looked at reserve space before returning from write(), even if they do delayed allocation. So they won't fail with ENOSPC at fsync() time or silently due to lost errors on background writeback. Otherwise we'd be hearing a LOT more noise about this. <blockquote> <blockquote> <ul> <li>Thin-provisioned storage backing local volume iSCSI or paravirt block device fills up</li> <li>NFS volume fills up</li> </ul> </blockquote> Those should be the same as the above. </blockquote> Unfortunately, they aren't. AFAICS NFS doesn't reserve space with the other end before returning from write(), even if mounted with the sync option. So we can get ENOSPC lazily when the buffer writeback fails due to a full backing file system. This then travels the same paths as EIO: we fsync(), ERROR, retry, appear to succeed, and carry on with life losing the data. Or we never hear about the error in the first place. (There's a proposed extension that'd allow this, see <a href="https://tools.ietf.org/html/draft-iyer-nfsv4-space-reservation-ops-02#page-5">https://tools.ietf.org/html/draft-iyer-nfsv4-space-reservation-ops-02#page-5</a>, but I see no mention of it in fs/nfs. All the reserve_space / xdr_reserve_space stuff seems to be related to space in protocol messages at a quick read.) Thin provisioned storage could vary a fair bit depending on the implementation. But the specific failure case I saw, prompting this thread, was on a volume using the stack: <pre><code>xfs -> lvm2 -> multipath -> ??? -> SAN </code></pre> (the HBA/iSCSI/whatever was not recorded by the looks, but IIRC it was iSCSI. I'm checking.) The SAN ran out of space. Due to use of thin provisioning, Linux thought there was plenty of space on the volume; LVM thought it had plenty of physical extents free and unallocated, XFS thought there was tons of free space, etc. The space exhaustion manifested as I/O errors on flushes of writeback buffers. The logs were like this: <pre><code>kernel: sd 2:0:0:1: [sdd] Unhandled sense code kernel: sd 2:0:0:1: [sdd] kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE kernel: sd 2:0:0:1: [sdd] kernel: Sense Key : Data Protect [current] kernel: sd 2:0:0:1: [sdd] kernel: Add. Sense: Space allocation failed write protect kernel: sd 2:0:0:1: [sdd] CDB: kernel: Write(16): **HEX-DATA-CUT-OUT** kernel: Buffer I/O error on device dm-0, logical block 3098338786 kernel: lost page write due to I/O error on dm-0 kernel: Buffer I/O error on device dm-0, logical block 3098338787 </code></pre> The immediate cause was that Linux's multipath driver didn't seem to recognise the sense code as retryable, so it gave up and reported it to the next layer up (LVM). LVM and XFS both seem to think that the lower layer is responsible for retries, so they toss the write away, and tell any interested writers if they feel like it, per discussion upthread. In this case Pg did get the news and reported fsync() errors on checkpoints, but it only reported an error once per relfilenode. Once it ran out of failed relfilenodes to cause the checkpoint to ERROR, it "completed" a "successful" checkpoint and kept on running until the resulting corruption started to manifest its self and it segfaulted some time later. As we've now learned, there's no guarantee we'd even get the news about the I/O errors at all. WAL was on a separate volume that didn't run out of room immediately, so we didn't PANIC on WAL write failure and prevent the issue. In this case if Pg had PANIC'd (and been able to guarantee to get the news of write failures reliably), there'd have been no corruption and no data loss despite the underlying storage issue. If, prior to seeing this, you'd asked me "will my PostgreSQL database be corrupted if my thin-provisioned volume runs out of space" I'd have said "Surely not. PostgreSQL won't be corrupted by running out of disk space, it orders writes carefully and forces flushes so that it will recover gracefully from write failures." Except not. I was very surprised. BTW, it also turns out that the default for multipath is to give up on errors anyway; see the queue_if_no_path option and no_path_retries options. (Hint: run PostgreSQL with no_path_retries=queue). That's a sane default if you use O_DIRECT|O_SYNC, and otherwise pretty much a data-eating setup. I regularly see rather a lot of multipath systems, iSCSI systems, SAN backed systems, etc. I think we need to be pretty clear that we expect them to retry indefinitely, and if they report an I/O error we cannot reliably handle it. We need to patch Pg to PANIC on any fsync() failure and document that Pg won't notice some storage failure modes that might otherwise be considered nonfatal or transient, so very specific storage configuration and testing is required. (Not that anyone will do it). Also warn against running on NFS even with "hard,sync,nointr". It'd be interesting to have a tool that tested error handling, allowing people to do iSCSI plug-pull tests, that sort of thing. But as far as I can tell nobody ever tests their storage stack anyway, so I don't plan on writing something that'll never get used. <blockquote> <blockquote> I think we need to think about a more robust path in future. But it's certainly not "stop the world" territory. </blockquote> I think you're underestimating the complexity of doing that by at least two orders of magnitude. </blockquote> Oh, it's just a minor total rewrite of half Pg, no big deal ;) I'm sure that no matter how big I think it is, I'm still underestimating it. The most workable option IMO would be some sort of fnotify/dnotify/whatever that reports all I/O errors on a volume. Some kind of error reporting handle we can keep open on a volume level that we can check for each volume/tablespace after we fsync() everything to see if it all really worked. If we PANIC if that gives us a bad answer, and PANIC on fsync errors, we guard against the great majority of these sorts of should-be-transient-if-the-kernel-didn't-give-up-and-throw-away-our-data errors. Even then, good luck getting those events from an NFS volume in which the backing volume experiences an issue. And it's kind of moot because AFAICS no such interface exists. <hr /> <pre><code>From:Greg Stark <stark(at)mit(dot)edu> Date:2018-04-09 08:45:40 </code></pre> On 8 April 2018 at 22:47, Anthony Iliopoulos wrote: <blockquote> On Sun, Apr 08, 2018 at 10:23:21PM +0100, Greg Stark wrote: <blockquote> On 8 April 2018 at 04:27, Craig Ringer wrote: <blockquote> On 8 April 2018 at 10:16, Thomas Munro </blockquote> </blockquote> The question is, what should the kernel and application do in cases where this is simply not possible (according to freebsd that keeps dirty pages around after failure, for example, -EIO from the block layer is a contract for unrecoverable errors so it is pointless to keep them dirty). You'd need a specialized interface to clear-out the errors (and drop the dirty pages), or potentially just remount the filesystem. </blockquote> Well firstly that's not necessarily the question. ENOSPC is not an unrecoverable error. And even unrecoverable errors for a single write doesn't mean the write will never be able to succeed in the future. But secondly doesn't such an interface already exist? When the device is dropped any dirty pages already get dropped with it. What's the point in dropping them but keeping the failing device? But just to underline the point. "pointless to keep them dirty" is exactly backwards from the application's point of view. If the error writing to persistent media really is unrecoverable then it's all the more critical that the pages be kept so the data can be copied to some other device. The last thing user space expects to happen is if the data can't be written to persistent storage then also immediately delete it from RAM. (And the really last thing user space expects is for this to happen and return no error.) <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-09 10:50:41 </code></pre> On Mon, Apr 09, 2018 at 09:45:40AM +0100, Greg Stark wrote: <blockquote> On 8 April 2018 at 22:47, Anthony Iliopoulos wrote: <blockquote> On Sun, Apr 08, 2018 at 10:23:21PM +0100, Greg Stark wrote: <blockquote> On 8 April 2018 at 04:27, Craig Ringer wrote: <blockquote> On 8 April 2018 at 10:16, Thomas Munro </blockquote> </blockquote> The question is, what should the kernel and application do in cases where this is simply not possible (according to freebsd that keeps dirty pages around after failure, for example, -EIO from the block layer is a contract for unrecoverable errors so it is pointless to keep them dirty). You'd need a specialized interface to clear-out the errors (and drop the dirty pages), or potentially just remount the filesystem. </blockquote> Well firstly that's not necessarily the question. ENOSPC is not an unrecoverable error. And even unrecoverable errors for a single write doesn't mean the write will never be able to succeed in the future. </blockquote> To make things a bit simpler, let us focus on EIO for the moment. The contract between the block layer and the filesystem layer is assumed to be that of, when an EIO is propagated up to the fs, then you may assume that all possibilities for recovering have been exhausted in lower layers of the stack. Mind you, I am not claiming that this contract is either documented or necessarily respected (in fact there have been studies on the error propagation and handling of the block layer, see [1]). Let us assume that this is the design contract though (which appears to be the case across a number of open-source kernels), and if not - it's a bug. In this case, indeed the specific write()s will never be able to succeed in the future, at least not as long as the BIOs are allocated to the specific failing LBAs. <blockquote> But secondly doesn't such an interface already exist? When the device is dropped any dirty pages already get dropped with it. What's the point in dropping them but keeping the failing device? </blockquote> I think there are degrees of failure. There are certainly cases where one may encounter localized unrecoverable medium errors (specific to certain LBAs) that are non-maskable from the block layer and below. That does not mean that the device is dropped at all, so it does make sense to continue all other operations to all other regions of the device that are functional. In cases of total device failure, then the filesystem will prevent you from proceeding anyway. <blockquote> But just to underline the point. "pointless to keep them dirty" is exactly backwards from the application's point of view. If the error writing to persistent media really is unrecoverable then it's all the more critical that the pages be kept so the data can be copied to some other device. The last thing user space expects to happen is if the data can't be written to persistent storage then also immediately delete it from RAM. (And the really last thing user space expects is for this to happen and return no error.) </blockquote> Right. This implies though that apart from the kernel having to keep around the dirtied-but-unrecoverable pages for an unbounded time, that there's further an interface for obtaining the exact failed pages so that you can read them back. This in turn means that there needs to be an association between the fsync() caller and the specific dirtied pages that the caller intents to drain (for which we'd need an fsync_range(), among other things). BTW, currently the failed writebacks are not dropped from memory, but rather marked clean. They could be lost though due to memory pressure or due to explicit request (e.g. proc drop_caches), unless mlocked. There is a clear responsibility of the application to keep its buffers around until a successful fsync(). The kernels do report the error (albeit with all the complexities of dealing with the interface), at which point the application may not assume that the write()s where ever even buffered in the kernel page cache in the first place. What you seem to be asking for is the capability of dropping buffers over the (kernel) fence and idemnifying the application from any further responsibility, i.e. a hard assurance that either the kernel will persist the pages or it will keep them around till the application recovers them asynchronously, the filesystem is unmounted, or the system is rebooted. [1] <a href="https://www.usenix.org/legacy/event/fast08/tech/full_papers/gunawi/gunawi.pdf">https://www.usenix.org/legacy/event/fast08/tech/full_papers/gunawi/gunawi.pdf</a> <hr /> <pre><code>From:Geoff Winkless <pgsqladmin(at)geoff(dot)dj> Date:2018-04-09 12:03:28 </code></pre> On 9 April 2018 at 11:50, Anthony Iliopoulos wrote: <blockquote> What you seem to be asking for is the capability of dropping buffers over the (kernel) fence and idemnifying the application from any further responsibility, i.e. a hard assurance that either the kernel will persist the pages or it will keep them around till the application recovers them asynchronously, the filesystem is unmounted, or the system is rebooted. </blockquote> That seems like a perfectly reasonable position to take, frankly. The whole point of an Operating System should be that you can do exactly that. As a developer I should be able to call write() and fsync() and know that if both calls have succeeded then the result is on disk, no matter what another application has done in the meantime. If that's a "difficult" problem then that's the OS's problem, not mine. If the OS doesn't do that, it's _not_doing_itsjob. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-09 12:16:38 </code></pre> On 9 April 2018 at 18:50, Anthony Iliopoulos wrote: <blockquote> There is a clear responsibility of the application to keep its buffers around until a successful fsync(). The kernels do report the error (albeit with all the complexities of dealing with the interface), at which point the application may not assume that the write()s where ever even buffered in the kernel page cache in the first place. What you seem to be asking for is the capability of dropping buffers over the (kernel) fence and idemnifying the application from any further responsibility, i.e. a hard assurance that either the kernel will persist the pages or it will keep them around till the application recovers them asynchronously, the filesystem is unmounted, or the system is rebooted. </blockquote> That's what Pg appears to assume now, yes. Whether that's reasonable is a whole different topic. I'd like a middle ground where the kernel lets us register our interest and tells us if it lost something, without us having to keep eight million FDs open for some long period. "Tell us about anything that happens under pgdata/" or an inotify-style per-directory-registration option. I'd even say that's ideal. In the mean time, I propose that we fsync() on close() before we age FDs out of the LRU on backends. Yes, that will hurt throughput and cause stalls, but we don't seem to have many better options. At least it'll only flush what we actually wrote to the OS buffers not what we may have in shared_buffers. If the bgwriter does the same thing, we should be 100% safe from this problem on 4.13+, and it'd be trivial to make it a GUC much like the fsync or full_page_writes options that people can turn off if they know the risks / know their storage is safe / don't care. Some keen person who wants to later could optimise it by adding a fsync worker thread pool in backends, so we don't block the main thread. Frankly that might be a nice thing to have in the checkpointer anyway. But it's out of scope for fixing this in durability terms. I'm partway through a patch that makes fsync panic on errors now. Once that's done, the next step will be to force fsync on close() in md and see how we go with that. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-09 12:31:27 </code></pre> On Mon, Apr 09, 2018 at 01:03:28PM +0100, Geoff Winkless wrote: <blockquote> On 9 April 2018 at 11:50, Anthony Iliopoulos wrote: <blockquote> What you seem to be asking for is the capability of dropping buffers over the (kernel) fence and idemnifying the application from any further responsibility, i.e. a hard assurance that either the kernel will persist the pages or it will keep them around till the application recovers them asynchronously, the filesystem is unmounted, or the system is rebooted. </blockquote> That seems like a perfectly reasonable position to take, frankly. </blockquote> Indeed, as long as you are willing to ignore the consequences of this design decision: mainly, how you would recover memory when no application is interested in clearing the error. At which point other applications with different priorities will find this position rather unreasonable since there can be no way out of it for them. Good luck convincing any OS kernel upstream to go with this design. <blockquote> The whole point of an Operating System should be that you can do exactly that. As a developer I should be able to call write() and fsync() and know that if both calls have succeeded then the result is on disk, no matter what another application has done in the meantime. If that's a "difficult" problem then that's the OS's problem, not mine. If the OS doesn't do that, it's _not_doing_itsjob. </blockquote> No OS kernel that I know of provides any promises for atomicity of a write()+fsync() sequence, unless one is using O_SYNC. It doesn't provide you with isolation either, as this is delegated to userspace, where processes that share a file should coordinate accordingly. It's not a difficult problem, but rather the kernels provide a common denominator of possible interfaces and designs that could accommodate a wider range of potential application scenarios for which the kernel cannot possibly anticipate requirements. There have been plenty of experimental works for providing a transactional (ACID) filesystem interface to applications. On the opposite end, there have been quite a few commercial databases that completely bypass the kernel storage stack. But I would assume it is reasonable to figure out something between those two extremes that can work in a "portable" fashion. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-09 12:54:16 </code></pre> On Mon, Apr 09, 2018 at 08:16:38PM +0800, Craig Ringer wrote: <blockquote> I'd like a middle ground where the kernel lets us register our interest and tells us if it lost something, without us having to keep eight million FDs open for some long period. "Tell us about anything that happens under pgdata/" or an inotify-style per-directory-registration option. I'd even say that's ideal. </blockquote> I see what you are saying. So basically you'd always maintain the notification descriptor open, where the kernel would inject events related to writeback failures of files under watch (potentially enriched to contain info regarding the exact failed pages and the file offset they map to). The kernel wouldn't even have to maintain per-page bits to trace the errors, since they will be consumed by the process that reads the events (or discarded, when the notification fd is closed). Assuming this would be possible, wouldn't Pg still need to deal with synchronizing writers and related issues (since this would be merely a notification mechanism - not prevent any process from continuing), which I understand would be rather intrusive for the current Pg multi-process design. But other than that, similarly this interface could in principle be similarly implemented in the BSDs via kqueue(), I suppose, to provide what you need. <hr /> <pre><code>From:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> Date:2018-04-09 13:33:18 </code></pre> On 04/09/2018 02:31 PM, Anthony Iliopoulos wrote: <blockquote> On Mon, Apr 09, 2018 at 01:03:28PM +0100, Geoff Winkless wrote: <blockquote> On 9 April 2018 at 11:50, Anthony Iliopoulos wrote: <blockquote> What you seem to be asking for is the capability of dropping buffers over the (kernel) fence and idemnifying the application from any further responsibility, i.e. a hard assurance that either the kernel will persist the pages or it will keep them around till the application recovers them asynchronously, the filesystem is unmounted, or the system is rebooted. </blockquote> That seems like a perfectly reasonable position to take, frankly. </blockquote> Indeed, as long as you are willing to ignore the consequences of this design decision: mainly, how you would recover memory when no application is interested in clearing the error. At which point other applications with different priorities will find this position rather unreasonable since there can be no way out of it for them. </blockquote> Sure, but the question is whether the system can reasonably operate after some of the writes failed and the data got lost. Because if it can't, then recovering the memory is rather useless. It might be better to stop the system in that case, forcing the system administrator to resolve the issue somehow (fail-over to a replica, perform recovery from the last checkpoint, ...). We already have dirty_bytes and dirty_background_bytes, for example. I don't see why there couldn't be another limit defining how much dirty data to allow before blocking writes altogether. I'm sure it's not that simple, but you get the general idea - do not allow using all available memory because of writeback issues, but don't throw the data away in case it's just a temporary issue. <blockquote> Good luck convincing any OS kernel upstream to go with this design. </blockquote> Well, there seem to be kernels that seem to do exactly that already. At least that's how I understand what this thread says about FreeBSD and Illumos, for example. So it's not an entirely insane design, apparently. The question is whether the current design makes it any easier for user-space developers to build reliable systems. We have tried using it, and unfortunately the answers seems to be "no" and "Use direct I/O and manage everything on your own!" <blockquote> <blockquote> The whole point of an Operating System should be that you can do exactly that. As a developer I should be able to call write() and fsync() and know that if both calls have succeeded then the result is on disk, no matter what another application has done in the meantime. If that's a "difficult" problem then that's the OS's problem, not mine. If the OS doesn't do that, it's _not_doing_itsjob. </blockquote> No OS kernel that I know of provides any promises for atomicity of a write()+fsync() sequence, unless one is using O_SYNC. It doesn't provide you with isolation either, as this is delegated to userspace, where processes that share a file should coordinate accordingly. </blockquote> We can (and do) take care of the atomicity and isolation. Implementation of those parts is obviously very application-specific, and we have WAL and locks for that purpose. I/O on the other hand seems to be a generic service provided by the OS - at least that's how we saw it until now. <blockquote> It's not a difficult problem, but rather the kernels provide a common denominator of possible interfaces and designs that could accommodate a wider range of potential application scenarios for which the kernel cannot possibly anticipate requirements. There have been plenty of experimental works for providing a transactional (ACID) filesystem interface to applications. On the opposite end, there have been quite a few commercial databases that completely bypass the kernel storage stack. But I would assume it is reasonable to figure out something between those two extremes that can work in a "portable" fashion. </blockquote> Users ask us about this quite often, actually. The question is usually about "RAW devices" and performance, but ultimately it boils down to buffered vs. direct I/O. So far our answer was we rely on kernel to do this reliably, because they know how to do that correctly and we simply don't have the manpower to implement it (portable, reliable, handling different types of storage, ...). One has to wonder how many applications actually use this correctly, considering PostgreSQL cares about data durability/consistency so much and yet we've been misunderstanding how it works for 20+ years. <hr /> <pre><code>From:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> Date:2018-04-09 13:42:35 </code></pre> On 04/09/2018 12:29 AM, Bruce Momjian wrote: <blockquote> An crazy idea would be to have a daemon that checks the logs and stops Postgres when it seems something wrong. </blockquote> That doesn't seem like a very practical way. It's better than nothing, of course, but I wonder how would that work with containers (where I think you may not have access to the kernel log at all). Also, I'm pretty sure the messages do change based on kernel version (and possibly filesystem) so parsing it reliably seems rather difficult. And we probably don't want to PANIC after I/O error on an unrelated device, so we'd need to understand which devices are related to PostgreSQL. <hr /> <pre><code>From:Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> Date:2018-04-09 13:47:03 </code></pre> At 2018-04-09 15:42:35 +0200, tomas(dot)vondra(at)2ndquadrant(dot)com wrote: <blockquote> On 04/09/2018 12:29 AM, Bruce Momjian wrote: <blockquote> An crazy idea would be to have a daemon that checks the logs and stops Postgres when it seems something wrong. </blockquote> That doesn't seem like a very practical way. </blockquote> Not least because Craig's tests showed that you can't rely on always getting an error message in the logs. <hr /> <pre><code>From:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> Date:2018-04-09 13:54:19 </code></pre> On 04/09/2018 04:00 AM, Craig Ringer wrote: <blockquote> On 9 April 2018 at 07:16, Andres Freund <andres(at)anarazel(dot)de <blockquote> I think the danger presented here is far smaller than some of the statements in this thread might make one think. </blockquote> Clearly it's not happening a huge amount or we'd have a lot of noise about Pg eating people's data, people shouting about how unreliable it is, etc. We don't. So it's not some earth shattering imminent threat to everyone's data. It's gone unnoticed, or the root cause unidentified, for a long time. </blockquote> Yeah, it clearly isn't the case that everything we do suddenly got pointless. It's fairly annoying, though. <blockquote> I suspect we've written off a fair few issues in the past as "it'd bad hardware" when actually, the hardware fault was the trigger for a Pg/kernel interaction bug. And blamed containers for things that weren't really the container's fault. But even so, if it were happening tons, we'd hear more noise. </blockquote> Right. Write errors are fairly rare, and we've probably ignored a fair number of cases demonstrating this issue. It kinda reminds me the wisdom that not seeing planes with bullet holes in the engine does not mean engines don't need armor [1]. [1] <a href="https://medium.com/@penguinpress/an-excerpt-from-how-not-to-be-wrong-by-jordan-ellenberg-664e708cfc3d">https://medium.com/@penguinpress/an-excerpt-from-how-not-to-be-wrong-by-jordan-ellenberg-664e708cfc3d</a> <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-09 14:22:06 </code></pre> On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote: <blockquote> We already have dirty_bytes and dirty_background_bytes, for example. I don't see why there couldn't be another limit defining how much dirty data to allow before blocking writes altogether. I'm sure it's not that simple, but you get the general idea - do not allow using all available memory because of writeback issues, but don't throw the data away in case it's just a temporary issue. </blockquote> Sure, there could be knobs for limiting how much memory such "zombie" pages may occupy. Not sure how helpful it would be in the long run since this tends to be highly application-specific, and for something with a large data footprint one would end up tuning this accordingly in a system-wide manner. This has the potential to leave other applications running in the same system with very little memory, in cases where for example original application crashes and never clears the error. Apart from that, further interfaces would need to be provided for actually dealing with the error (again assuming non-transient issues that may not be fixed transparently and that temporary issues are taken care of by lower layers of the stack). <blockquote> Well, there seem to be kernels that seem to do exactly that already. At least that's how I understand what this thread says about FreeBSD and Illumos, for example. So it's not an entirely insane design, apparently. </blockquote> It is reasonable, but even FreeBSD has a big fat comment right there (since 2017), mentioning that there can be no recovery from EIO at the block layer and this needs to be done differently. No idea how an application running on top of either FreeBSD or Illumos would actually recover from this error (and clear it out), other than remounting the fs in order to force dropping of relevant pages. It does provide though indeed a persistent error indication that would allow Pg to simply reliably panic. But again this does not necessarily play well with other applications that may be using the filesystem reliably at the same time, and are now faced with EIO while their own writes succeed to be persisted. Ideally, you'd want a (potentially persistent) indication of error localized to a file region (mapping the corresponding failed writeback pages). NetBSD is already implementing fsync_ranges(), which could be a step in the right direction. <blockquote> One has to wonder how many applications actually use this correctly, considering PostgreSQL cares about data durability/consistency so much and yet we've been misunderstanding how it works for 20+ years. </blockquote> I would expect it would be very few, potentially those that have a very simple process model (e.g. embedded DBs that can abort a txn on fsync() EIO). I think that durability is a rather complex cross-layer issue which has been grossly misunderstood similarly in the past (e.g. see [1]). It seems that both the OS and DB communities greatly benefit from a periodic reality check, and I see this as an opportunity for strengthening the IO stack in an end-to-end manner. [1] <a href="https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf">https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf</a> <hr /> <pre><code>From:Greg Stark <stark(at)mit(dot)edu> Date:2018-04-09 15:29:36 </code></pre> On 9 April 2018 at 15:22, Anthony Iliopoulos wrote: <blockquote> On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote: Sure, there could be knobs for limiting how much memory such "zombie" pages may occupy. Not sure how helpful it would be in the long run since this tends to be highly application-specific, and for something with a large data footprint one would end up tuning this accordingly in a system-wide manner. </blockquote> Surely this is exactly what the kernel is there to manage. It has to control how much memory is allowed to be full of dirty buffers in the first place to ensure that the system won't get memory starved if it can't clean them fast enough. That isn't even about persistent hardware errors. Even when the hardware is working perfectly it can only flush buffers so fast. The whole point of the kernel is to abstract away shared resources. It's not like user space has any better view of the situation here. If Postgres implemented all this in DIRECT_IO it would have exactly the same problem only with less visibility into what the rest of the system is doing. If every application implemented its own buffer cache we would be back in the same boat only with a fragmented memory allocation. <blockquote> This has the potential to leave other applications running in the same system with very little memory, in cases where for example original application crashes and never clears the error. </blockquote> I still think we're speaking two different languages. There's no application anywhere that's going to "clear the error". The application has done the writes and if it's calling fsync it wants to wait until the filesystem can arrange for the write to be persisted. If the application could manage without the persistence then it wouldn't have called fsync. The only way to "clear out" the error would be by having the writes succeed. There's no reason to think that wouldn't be possible sometime. The filesystem could remap blocks or an administrator could replace degraded raid device components. The only thing Postgres could do to recover would be create a new file and move the data (reading from the dirty buffer in memory!) to a new file anyways so we would "clear the error" by just no longer calling fsync on the old file. We always read fsync as a simple write barrier. That's what the documentation promised and it's what Postgres always expected. It sounds like the kernel implementors looked at it as some kind of communication channel to communicate status report for specific writes back to user-space. That's a much more complex problem and would have entirely different interface. I think this is why we're having so much difficulty communicating. <blockquote> It is reasonable, but even FreeBSD has a big fat comment right there (since 2017), mentioning that there can be no recovery from EIO at the block layer and this needs to be done differently. No idea how an application running on top of either FreeBSD or Illumos would actually recover from this error (and clear it out), other than remounting the fs in order to force dropping of relevant pages. It does provide though indeed a persistent error indication that would allow Pg to simply reliably panic. But again this does not necessarily play well with other applications that may be using the filesystem reliably at the same time, and are now faced with EIO while their own writes succeed to be persisted. </blockquote> Well if they're writing to the same file that had a previous error I doubt there are many applications that would be happy to consider their writes "persisted" when the file was corrupt. Ironically the earlier discussion quoted talked about how applications that wanted more granular communication would be using O_DIRECT -- but what we have is fsync trying to be too granular such that it's impossible to get any strong guarantees about anything with it. <blockquote> <blockquote> One has to wonder how many applications actually use this correctly, considering PostgreSQL cares about data durability/consistency so much and yet we've been misunderstanding how it works for 20+ years. </blockquote> I would expect it would be very few, potentially those that have a very simple process model (e.g. embedded DBs that can abort a txn on fsync() EIO). </blockquote> Honestly I don't think there's any way to use the current interface to implement reliable operation. Even that embedded database using a single process and keeping every file open all the time (which means file descriptor limits limit its scalability) can be having silent corruption whenever some other process like a backup program comes along and calls fsync (or even sync?). <hr /> <pre><code>From:Robert Haas <robertmhaas(at)gmail(dot)com> Date:2018-04-09 16:45:00 </code></pre> On Mon, Apr 9, 2018 at 8:16 AM, Craig Ringer wrote: <blockquote> In the mean time, I propose that we fsync() on close() before we age FDs out of the LRU on backends. Yes, that will hurt throughput and cause stalls, but we don't seem to have many better options. At least it'll only flush what we actually wrote to the OS buffers not what we may have in shared_buffers. If the bgwriter does the same thing, we should be 100% safe from this problem on 4.13+, and it'd be trivial to make it a GUC much like the fsync or full_page_writes options that people can turn off if they know the risks / know their storage is safe / don't care. </blockquote> Ouch. If a process exits -- say, because the user typed \q into psql -- then you're talking about potentially calling fsync() on a really large number of file descriptor flushing many gigabytes of data to disk. And it may well be that you never actually wrote any data to any of those file descriptors -- those writes could have come from other backends. Or you may have written a little bit of data through those FDs, but there could be lots of other data that you end up flushing incidentally. Perfectly innocuous things like starting up a backend, running a few short queries, and then having that backend exit suddenly turn into something that could have a massive system-wide performance impact. Also, if a backend ever manages to exit without running through this code, or writes any dirty blocks afterward, then this still fails to fix the problem completely. I guess that's probably avoidable -- we can put this late in the shutdown sequence and PANIC if it fails. I have a really tough time believing this is the right way to solve the problem. We suffered for years because of ext3's desire to flush the entire page cache whenever any single file was fsync()'d, which was terrible. Eventually ext4 became the norm, and the problem went away. Now we're going to deliberately insert logic to do a very similar kind of terrible thing because the kernel developers have decided that fsync() doesn't have to do what it says on the tin? I grant that there doesn't seem to be a better option, but I bet we're going to have a lot of really unhappy users if we do this. <hr /> <pre><code>From:"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> Date:2018-04-09 17:26:24 </code></pre> On 04/09/2018 09:45 AM, Robert Haas wrote: <blockquote> On Mon, Apr 9, 2018 at 8:16 AM, Craig Ringer wrote: <blockquote> In the mean time, I propose that we fsync() on close() before we age FDs out of the LRU on backends. Yes, that will hurt throughput and cause stalls, but we don't seem to have many better options. At least it'll only flush what we actually wrote to the OS buffers not what we may have in shared_buffers. If the bgwriter does the same thing, we should be 100% safe from this problem on 4.13+, and it'd be trivial to make it a GUC much like the fsync or full_page_writes options that people can turn off if they know the risks / know their storage is safe / don't care. </blockquote> I have a really tough time believing this is the right way to solve the problem. We suffered for years because of ext3's desire to flush the entire page cache whenever any single file was fsync()'d, which was terrible. Eventually ext4 became the norm, and the problem went away. Now we're going to deliberately insert logic to do a very similar kind of terrible thing because the kernel developers have decided that fsync() doesn't have to do what it says on the tin? I grant that there doesn't seem to be a better option, but I bet we're going to have a lot of really unhappy users if we do this. </blockquote> I don't have a better option but whatever we do, it should be an optional (GUC) change. We have plenty of YEARS of people not noticing this issue and Robert's correct, if we go back to an era of things like stalls it is going to look bad on us no matter how we describe the problem. <hr /> <pre><code>From:Gasper Zejn <zejn(at)owca(dot)info> Date:2018-04-09 18:02:21 </code></pre> On 09. 04. 2018 15:42, Tomas Vondra wrote: <blockquote> On 04/09/2018 12:29 AM, Bruce Momjian wrote: <blockquote> An crazy idea would be to have a daemon that checks the logs and stops Postgres when it seems something wrong. </blockquote> That doesn't seem like a very practical way. It's better than nothing, of course, but I wonder how would that work with containers (where I think you may not have access to the kernel log at all). Also, I'm pretty sure the messages do change based on kernel version (and possibly filesystem) so parsing it reliably seems rather difficult. And we probably don't want to PANIC after I/O error on an unrelated device, so we'd need to understand which devices are related to PostgreSQL. regards </blockquote> For a bit less (or more) crazy idea, I'd imagine creating a Linux kernel module with kprobe/kretprobe capturing the file passed to fsync or even byte range within file and corresponding return value shouldn't be that hard. Kprobe has been a part of Linux kernel for a really long time, and from first glance it seems like it could be backported to 2.6 too. Then you could have stable log messages or implement some kind of "fsync error log notification" via whatever is the most sane way to get this out of kernel. If the kernel is new enough and has eBPF support (seems like >=4.4), using bcc-tools[1] should enable you to write a quick script to get exactly that info via perf events[2]. Obviously, that's a stopgap solution ... [1] <a href="https://github.com/iovisor/bcc">https://github.com/iovisor/bcc</a> [2] <a href="https://blog.yadutaf.fr/2016/03/30/turn-any-syscall-into-event-introducing-ebpf-kernel-probes/">https://blog.yadutaf.fr/2016/03/30/turn-any-syscall-into-event-introducing-ebpf-kernel-probes/</a> <hr /> <pre><code>From:Mark Dilger <hornschnorter(at)gmail(dot)com> Date:2018-04-09 18:29:42 </code></pre> <blockquote> On Apr 9, 2018, at 10:26 AM, Joshua D. Drake wrote: We have plenty of YEARS of people not noticing this issue </blockquote> I disagree. I have noticed this problem, but blamed it on other things. For over five years now, I have had to tell customers not to use thin provisioning, and I have had to add code to postgres to refuse to perform inserts or updates if the disk volume is more than 80% full. I have lost count of the number of customers who are running an older version of the product (because they refuse to upgrade) and come back with complaints that they ran out of disk and now their database is corrupt. All this time, I have been blaming this on virtualization and thin provisioning. <hr /> <pre><code>From:Robert Haas <robertmhaas(at)gmail(dot)com> Date:2018-04-09 19:02:11 </code></pre> On Mon, Apr 9, 2018 at 12:45 PM, Robert Haas wrote: <blockquote> Ouch. If a process exits -- say, because the user typed \q into psql -- then you're talking about potentially calling fsync() on a really large number of file descriptor flushing many gigabytes of data to disk. And it may well be that you never actually wrote any data to any of those file descriptors -- those writes could have come from other backends. Or you may have written a little bit of data through those FDs, but there could be lots of other data that you end up flushing incidentally. Perfectly innocuous things like starting up a backend, running a few short queries, and then having that backend exit suddenly turn into something that could have a massive system-wide performance impact. Also, if a backend ever manages to exit without running through this code, or writes any dirty blocks afterward, then this still fails to fix the problem completely. I guess that's probably avoidable -- we can put this late in the shutdown sequence and PANIC if it fails. I have a really tough time believing this is the right way to solve the problem. We suffered for years because of ext3's desire to flush the entire page cache whenever any single file was fsync()'d, which was terrible. Eventually ext4 became the norm, and the problem went away. Now we're going to deliberately insert logic to do a very similar kind of terrible thing because the kernel developers have decided that fsync() doesn't have to do what it says on the tin? I grant that there doesn't seem to be a better option, but I bet we're going to have a lot of really unhappy users if we do this. </blockquote> What about the bug we fixed in <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ce439f3379aed857517c8ce207485655000fc8e">https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ce439f3379aed857517c8ce207485655000fc8e</a> ? Say somebody does something along the lines of: <pre><code>ps uxww | grep postgres | grep -v grep | awk '{print $2}' | xargs kill -9 </code></pre> ...and then restarts postgres. Craig's proposal wouldn't cover this case, because there was no opportunity to run fsync() after the first crash, and there's now no way to go back and fsync() any stuff we didn't fsync() before, because the kernel may have already thrown away the error state, or may lie to us and tell us everything is fine (because our new fd wasn't opened early enough). I can't find the original discussion that led to that commit right now, so I'm not exactly sure what scenarios we were thinking about. But I think it would at least be a problem if full_page_writes=off or if you had previously started the server with fsync=off and now wish to switch to fsync=on after completing a bulk load or similar. Recovery can read a page, see that it looks OK, and continue, and then a later fsync() failure can revert that page to an earlier state and now your database is corrupted -- and there's absolute no way to detect this because write() gives you the new page contents later, fsync() doesn't feel obliged to tell you about the error because your fd wasn't opened early enough, and eventually the write can be discarded and you'll revert back to the old page version with no errors ever being reported anywhere. Another consequence of this behavior that initdb -S is never reliable, so pg_rewind's use of it doesn't actually fix the problem it was intended to solve. It also means that initdb itself isn't crash-safe, since the data file changes are made by the backend but initdb itself is doing the fsyncs, and initdb has no way of knowing what files the backend is going to create and therefore can't -- even theoretically -- open them first. What's being presented to us as the API contract that we should expect from buffered I/O is that if you open a file and read() from it, call fsync(), and get no error, the kernel may nevertheless decide that some previous write that it never managed to flush can't be flushed, and then revert the page to the contents it had at some point in the past. That's mostly or less equivalent to letting a malicious adversary randomly overwrite database pages plausible-looking but incorrect contents without notice and hoping you can still build a reliable system. You can avoid the problem if you can always open an fd for every file you want to modify before it's written and hold on to it until after it's fsync'd, but that's pretty hard to guarantee in the face of kill -9. I think the simplest technological solution to this problem is to rewrite the entire backend and all supporting processes to use O_DIRECT everywhere. To maintain adequate performance, we'll have to write a complete I/O scheduling system inside PostgreSQL. Also, since we'll now have to make shared_buffers much larger -- since we'll no longer be benefiting from the OS cache -- we'll need to replace the use of malloc() with an allocator that pulls from shared_buffers. Plus, as noted, we'll need to totally rearchitect several of our critical frontend tools. Let's freeze all other development for the next year while we work on that, and put out a notice that Linux is no longer a supported platform for any existing release. Before we do that, we might want to check whether fsync() actually writes the data to disk in a usable way even with O_DIRECT. If not, we should just de-support Linux entirely as a hopelessly broken and unsupportable platform. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 19:13:14 </code></pre> Hi, On 2018-04-09 15:02:11 -0400, Robert Haas wrote: <blockquote> I think the simplest technological solution to this problem is to rewrite the entire backend and all supporting processes to use O_DIRECT everywhere. To maintain adequate performance, we'll have to write a complete I/O scheduling system inside PostgreSQL. Also, since we'll now have to make shared_buffers much larger -- since we'll no longer be benefiting from the OS cache -- we'll need to replace the use of malloc() with an allocator that pulls from shared_buffers. Plus, as noted, we'll need to totally rearchitect several of our critical frontend tools. Let's freeze all other development for the next year while we work on that, and put out a notice that Linux is no longer a supported platform for any existing release. Before we do that, we might want to check whether fsync() actually writes the data to disk in a usable way even with O_DIRECT. If not, we should just de-support Linux entirely as a hopelessly broken and unsupportable platform. </blockquote> Let's lower the pitchforks a bit here. Obviously a grand rewrite is absurd, as is some of the proposed ways this is all supposed to work. But I think the case we're discussing is much closer to a near irresolvable corner case than anything else. We're talking about the storage layer returning an irresolvable error. You're hosed even if we report it properly. Yes, it'd be nice if we could report it reliably. But that doesn't change the fact that what we're doing is ensuring that data is safely fsynced unless storage fails, in which case it's not safely fsynced anyway. <hr /> <pre><code>From:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> Date:2018-04-09 19:22:58 </code></pre> On 04/09/2018 08:29 PM, Mark Dilger wrote: <blockquote> <blockquote> On Apr 9, 2018, at 10:26 AM, Joshua D. Drake wrote: We have plenty of YEARS of people not noticing this issue </blockquote> I disagree. I have noticed this problem, but blamed it on other things. For over five years now, I have had to tell customers not to use thin provisioning, and I have had to add code to postgres to refuse to perform inserts or updates if the disk volume is more than 80% full. I have lost count of the number of customers who are running an older version of the product (because they refuse to upgrade) and come back with complaints that they ran out of disk and now their database is corrupt. All this time, I have been blaming this on virtualization and thin provisioning. </blockquote> Yeah. There's a big difference between not noticing an issue because it does not happen very often vs. attributing it to something else. If we had the ability to revisit past data corruption cases, we would probably discover a fair number of cases caused by this. The other thing we probably need to acknowledge is that the environment changes significantly - things like thin provisioning are likely to get even more common, increasing the incidence of these issues. <hr /> <pre><code>From:Peter Geoghegan <pg(at)bowt(dot)ie> Date:2018-04-09 19:25:33 </code></pre> On Mon, Apr 9, 2018 at 12:13 PM, Andres Freund wrote: <blockquote> Let's lower the pitchforks a bit here. Obviously a grand rewrite is absurd, as is some of the proposed ways this is all supposed to work. But I think the case we're discussing is much closer to a near irresolvable corner case than anything else. </blockquote> +1 <blockquote> We're talking about the storage layer returning an irresolvable error. You're hosed even if we report it properly. Yes, it'd be nice if we could report it reliably. But that doesn't change the fact that what we're doing is ensuring that data is safely fsynced unless storage fails, in which case it's not safely fsynced anyway. </blockquote> Right. We seem to be implicitly assuming that there is a big difference between a problem in the storage layer that we could in principle detect, but don't, and any other problem in the storage layer. I've read articles claiming that technologies like SMART are not really reliable in a practical sense [1], so it seems to me that there is reason to doubt that this gap is all that big. That said, I suspect that the problems with running out of disk space are serious practical problems. I have personally scoffed at stories involving Postgres databases corruption that gets attributed to running out of disk space. Looks like I was dead wrong. [1] <a href="https://danluu.com/file-consistency/">https://danluu.com/file-consistency/</a> -- "Filesystem correctness" <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-09 19:26:21 </code></pre> On Mon, Apr 09, 2018 at 04:29:36PM +0100, Greg Stark wrote: <blockquote> Honestly I don't think there's any way to use the current interface to implement reliable operation. Even that embedded database using a single process and keeping every file open all the time (which means file descriptor limits limit its scalability) can be having silent corruption whenever some other process like a backup program comes along and calls fsync (or even sync?). </blockquote> That is indeed true (sync would induce fsync on open inodes and clear the error), and that's a nasty bug that apparently went unnoticed for a very long time. Hopefully the errseq_t linux 4.13 fixes deal with at least this issue, but similar fixes need to be adopted by many other kernels (all those that mark failed pages as clean). I honestly do not expect that keeping around the failed pages will be an acceptable change for most kernels, and as such the recommendation will probably be to coordinate in userspace for the fsync(). What about having buffered IO with implied fsync() atomicity via O_SYNC? This would probably necessitate some helper threads that mask the latency and present an async interface to the rest of PG, but sounds less intrusive than going for DIO. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 19:29:16 </code></pre> On 2018-04-09 21:26:21 +0200, Anthony Iliopoulos wrote: <blockquote> What about having buffered IO with implied fsync() atomicity via O_SYNC? </blockquote> You're kidding, right? We could also just add sleep(30)'s all over the tree, and hope that that'll solve the problem. There's a reason we don't permanently fsync everything. Namely that it'll be way too slow. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 19:37:03 </code></pre> On April 9, 2018 12:26:21 PM PDT, Anthony Iliopoulos wrote: <blockquote> I honestly do not expect that keeping around the failed pages will be an acceptable change for most kernels, and as such the recommendation will probably be to coordinate in userspace for the fsync(). </blockquote> Why is that required? You could very well just keep per inode information about fatal failures that occurred around. Report errors until that bit is explicitly cleared. Yes, that keeps some memory around until unmount if nobody clears it. But it's orders of magnitude less, and results in usable semantics. <hr /> <pre><code>From:Justin Pryzby <pryzby(at)telsasoft(dot)com> Date:2018-04-09 19:41:19 </code></pre> On Mon, Apr 09, 2018 at 09:31:56AM +0800, Craig Ringer wrote: <blockquote> You could make the argument that it's OK to forget if the entire file system goes away. But actually, why is that ok? </blockquote> I was going to say that it'd be okay to clear error flag on umount, since any opened files would prevent unmounting; but, then I realized we need to consider the case of close()ing all FDs then opening them later..in another process. I was going to say that's fine for postgres, since it chdir()s into its basedir, but actually not fine for nondefault tablespaces.. On Mon, Apr 09, 2018 at 02:54:16PM +0200, Anthony Iliopoulos wrote: <blockquote> notification descriptor open, where the kernel would inject events related to writeback failures of files under watch (potentially enriched to contain info regarding the exact failed pages and the file offset they map to). </blockquote> For postgres that'd require backend processes to open() an file such that, following its close(), any writeback errors are "signalled" to the checkpointer process... <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-09 19:44:31 </code></pre> On Mon, Apr 09, 2018 at 12:29:16PM -0700, Andres Freund wrote: <blockquote> On 2018-04-09 21:26:21 +0200, Anthony Iliopoulos wrote: <blockquote> What about having buffered IO with implied fsync() atomicity via O_SYNC? </blockquote> You're kidding, right? We could also just add sleep(30)'s all over the tree, and hope that that'll solve the problem. There's a reason we don't permanently fsync everything. Namely that it'll be way too slow. </blockquote> I am assuming you can apply the same principle of selectively using O_SYNC at times and places that you'd currently actually call fsync(). Also assuming that you'd want to have a backwards-compatible solution for all those kernels that don't keep the pages around, irrespective of future fixes. Short of loading a kernel module and dealing with the problem directly, the only other available options seem to be either O_SYNC, O_DIRECT or ignoring the issue. <hr /> <pre><code>From:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> Date:2018-04-09 19:47:44 </code></pre> On 04/09/2018 04:22 PM, Anthony Iliopoulos wrote: <blockquote> On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote: <blockquote> We already have dirty_bytes and dirty_background_bytes, for example. I don't see why there couldn't be another limit defining how much dirty data to allow before blocking writes altogether. I'm sure it's not that simple, but you get the general idea - do not allow using all available memory because of writeback issues, but don't throw the data away in case it's just a temporary issue. </blockquote> Sure, there could be knobs for limiting how much memory such "zombie" pages may occupy. Not sure how helpful it would be in the long run since this tends to be highly application-specific, and for something with a large data footprint one would end up tuning this accordingly in a system-wide manner. This has the potential to leave other applications running in the same system with very little memory, in cases where for example original application crashes and never clears the error. Apart from that, further interfaces would need to be provided for actually dealing with the error (again assuming non-transient issues that may not be fixed transparently and that temporary issues are taken care of by lower layers of the stack). </blockquote> I don't quite see how this is any different from other possible issues when running multiple applications on the same system. One application can generate a lot of dirty data, reaching dirty_bytes and forcing the other applications on the same host to do synchronous writes. Of course, you might argue that is a temporary condition - it will resolve itself once the dirty pages get written to storage. In case of an I/O issue, it is a permanent impact - it will not resolve itself unless the I/O problem gets fixed. Not sure what interfaces would need to be written? Possibly something that says "drop dirty pages for these files" after the application gets killed or something. That makes sense, of course. <blockquote> <blockquote> Well, there seem to be kernels that seem to do exactly that already. At least that's how I understand what this thread says about FreeBSD and Illumos, for example. So it's not an entirely insane design, apparently. </blockquote> It is reasonable, but even FreeBSD has a big fat comment right there (since 2017), mentioning that there can be no recovery from EIO at the block layer and this needs to be done differently. No idea how an application running on top of either FreeBSD or Illumos would actually recover from this error (and clear it out), other than remounting the fs in order to force dropping of relevant pages. It does provide though indeed a persistent error indication that would allow Pg to simply reliably panic. But again this does not necessarily play well with other applications that may be using the filesystem reliably at the same time, and are now faced with EIO while their own writes succeed to be persisted. </blockquote> In my experience when you have a persistent I/O error on a device, it likely affects all applications using that device. So unmounting the fs to clear the dirty pages seems like an acceptable solution to me. I don't see what else the application should do? In a way I'm suggesting applications don't really want to be responsible for recovering (cleanup or dirty pages etc.). We're more than happy to hand that over to kernel, e.g. because each kernel will do that differently. What we however do want is reliable information about fsync outcome, which we need to properly manage WAL, checkpoints etc. <blockquote> Ideally, you'd want a (potentially persistent) indication of error localized to a file region (mapping the corresponding failed writeback pages). NetBSD is already implementing fsync_ranges(), which could be a step in the right direction. <blockquote> One has to wonder how many applications actually use this correctly, considering PostgreSQL cares about data durability/consistency so much and yet we've been misunderstanding how it works for 20+ years. </blockquote> I would expect it would be very few, potentially those that have a very simple process model (e.g. embedded DBs that can abort a txn on fsync() EIO). I think that durability is a rather complex cross-layer issue which has been grossly misunderstood similarly in the past (e.g. see [1]). It seems that both the OS and DB communities greatly benefit from a periodic reality check, and I see this as an opportunity for strengthening the IO stack in an end-to-end manner. </blockquote> Right. What I was getting to is that perhaps the current fsync() behavior is not very practical for building actual applications. <blockquote> Best regards, Anthony [1] <a href="https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf">https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf</a> </blockquote> Thanks. The paper looks interesting. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-09 19:51:12 </code></pre> On Mon, Apr 09, 2018 at 12:37:03PM -0700, Andres Freund wrote: <blockquote> On April 9, 2018 12:26:21 PM PDT, Anthony Iliopoulos wrote: <blockquote> I honestly do not expect that keeping around the failed pages will be an acceptable change for most kernels, and as such the recommendation will probably be to coordinate in userspace for the fsync(). </blockquote> Why is that required? You could very well just keep per inode information about fatal failures that occurred around. Report errors until that bit is explicitly cleared. Yes, that keeps some memory around until unmount if nobody clears it. But it's orders of magnitude less, and results in usable semantics. </blockquote> As discussed before, I think this could be acceptable, especially if you pair it with an opt-in mechanism (only applications that care to deal with this will have to), and would give it a shot. Still need a way to deal with all other systems and prior kernel releases that are eating fsync() writeback errors even over sync(). <hr /> <pre><code>From:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> Date:2018-04-09 19:54:05 </code></pre> On 04/09/2018 09:37 PM, Andres Freund wrote: <blockquote> On April 9, 2018 12:26:21 PM PDT, Anthony Iliopoulos wrote: <blockquote> I honestly do not expect that keeping around the failed pages will be an acceptable change for most kernels, and as such the recommendation will probably be to coordinate in userspace for the fsync(). </blockquote> Why is that required? You could very well just keep per inode information about fatal failures that occurred around. Report errors until that bit is explicitly cleared. Yes, that keeps some memory around until unmount if nobody clears it. But it's orders of magnitude less, and results in usable semantics. </blockquote> Isn't the expectation that when a fsync call fails, the next one will retry writing the pages in the hope that it succeeds? Of course, it's also possible to do what you suggested, and simply mark the inode as failed. In which case the next fsync can't possibly retry the writes (e.g. after freeing some space on thin-provisioned system), but we'd get reliable failure mode. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 19:59:34 </code></pre> On 2018-04-09 14:41:19 -0500, Justin Pryzby wrote: <blockquote> On Mon, Apr 09, 2018 at 09:31:56AM +0800, Craig Ringer wrote: <blockquote> You could make the argument that it's OK to forget if the entire file system goes away. But actually, why is that ok? </blockquote> I was going to say that it'd be okay to clear error flag on umount, since any opened files would prevent unmounting; but, then I realized we need to consider the case of close()ing all FDs then opening them later..in another process. On Mon, Apr 09, 2018 at 02:54:16PM +0200, Anthony Iliopoulos wrote: <blockquote> notification descriptor open, where the kernel would inject events related to writeback failures of files under watch (potentially enriched to contain info regarding the exact failed pages and the file offset they map to). </blockquote> For postgres that'd require backend processes to open() an file such that, following its close(), any writeback errors are "signalled" to the checkpointer process... </blockquote> I don't think that's as hard as some people argued in this thread. We could very well open a pipe in postmaster with the write end open in each subprocess, and the read end open only in checkpointer (and postmaster, but unused there). Whenever closing a file descriptor that was dirtied in the current process, send it over the pipe to the checkpointer. The checkpointer then can receive all those file descriptors (making sure it's not above the limit, fsync(), close() ing to make room if necessary). The biggest complication would presumably be to deduplicate the received filedescriptors for the same file, without loosing track of any errors. Even better, we could do so via a dedicated worker. That'd quite possibly end up as a performance benefit. <blockquote> I was going to say that's fine for postgres, since it chdir()s into its basedir, but actually not fine for nondefault tablespaces.. </blockquote> I think it'd be fair to open PG_VERSION of all created tablespaces. Would require some hangups to signal checkpointer (or whichever process) to do so when creating one, but it shouldn't be too hard. Some people would complain because they can't do some nasty hacks anymore, but it'd also save peoples butts by preventing them from accidentally unmounting. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 20:04:20 </code></pre> Hi, On 2018-04-09 21:54:05 +0200, Tomas Vondra wrote: <blockquote> Isn't the expectation that when a fsync call fails, the next one will retry writing the pages in the hope that it succeeds? </blockquote> Some people expect that, I personally don't think it's a useful expectation. We should just deal with this by crash-recovery. The big problem I see is that you always need to keep an file descriptor open for pretty much any file written to inside and outside of postgres, to be guaranteed to see errors. And that'd solve that. Even if retrying would work, I'd advocate for that (I've done so in the past, and I've written code in pg that panics on fsync failure...). What we'd need to do however is to clear that bit during crash recovery... Which is interesting from a policy perspective. Could be that other apps wouldn't want that. I also wonder if we couldn't just somewhere read each relevant mounted filesystem's errseq value. Whenever checkpointer notices before finishing a checkpoint that it has changed, do a crash restart. <hr /> <pre><code>From:Mark Dilger <hornschnorter(at)gmail(dot)com> Date:2018-04-09 20:25:54 </code></pre> <blockquote> On Apr 9, 2018, at 12:13 PM, Andres Freund wrote: Hi, On 2018-04-09 15:02:11 -0400, Robert Haas wrote: <blockquote> I think the simplest technological solution to this problem is to rewrite the entire backend and all supporting processes to use O_DIRECT everywhere. To maintain adequate performance, we'll have to write a complete I/O scheduling system inside PostgreSQL. Also, since we'll now have to make shared_buffers much larger -- since we'll no longer be benefiting from the OS cache -- we'll need to replace the use of malloc() with an allocator that pulls from shared_buffers. Plus, as noted, we'll need to totally rearchitect several of our critical frontend tools. Let's freeze all other development for the next year while we work on that, and put out a notice that Linux is no longer a supported platform for any existing release. Before we do that, we might want to check whether fsync() actually writes the data to disk in a usable way even with O_DIRECT. If not, we should just de-support Linux entirely as a hopelessly broken and unsupportable platform. </blockquote> Let's lower the pitchforks a bit here. Obviously a grand rewrite is absurd, as is some of the proposed ways this is all supposed to work. But I think the case we're discussing is much closer to a near irresolvable corner case than anything else. We're talking about the storage layer returning an irresolvable error. You're hosed even if we report it properly. Yes, it'd be nice if we could report it reliably. But that doesn't change the fact that what we're doing is ensuring that data is safely fsynced unless storage fails, in which case it's not safely fsynced anyway. </blockquote> I was reading this thread up until now as meaning that the standby could receive corrupt WAL data and become corrupted. That seems a much bigger problem than merely having the master become corrupted in some unrecoverable way. It is a long standing expectation that serious hardware problems on the master can result in the master needing to be replaced. But there has not been an expectation that the one or more standby servers would be taken down along with the master, leaving all copies of the database unusable. If this bug corrupts the standby servers, too, then it is a whole different class of problem than the one folks have come to expect. Your comment reads as if this is a problem isolated to whichever server has the problem, and will not get propagated to other servers. Am I reading that right? Can anybody clarify this for non-core-hacker folks following along at home? <hr /> <pre><code>From:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> Date:2018-04-09 20:30:00 </code></pre> On 04/09/2018 10:04 PM, Andres Freund wrote: <blockquote> Hi, On 2018-04-09 21:54:05 +0200, Tomas Vondra wrote: <blockquote> Isn't the expectation that when a fsync call fails, the next one will retry writing the pages in the hope that it succeeds? </blockquote> Some people expect that, I personally don't think it's a useful expectation. </blockquote> Maybe. I'd certainly prefer automated recovery from an temporary I/O issues (like full disk on thin-provisioning) without the database crashing and restarting. But I'm not sure it's worth the effort. And most importantly, it's rather delusional to think the kernel developers are going to be enthusiastic about that approach ... <blockquote> We should just deal with this by crash-recovery. The big problem I see is that you always need to keep an file descriptor open for pretty much any file written to inside and outside of postgres, to be guaranteed to see errors. And that'd solve that. Even if retrying would work, I'd advocate for that (I've done so in the past, and I've written code in pg that panics on fsync failure...). </blockquote> Sure. And it's likely way less invasive from kernel perspective. <blockquote> What we'd need to do however is to clear that bit during crash recovery... Which is interesting from a policy perspective. Could be that other apps wouldn't want that. </blockquote> IMHO it'd be enough if a remount clears it. <blockquote> I also wonder if we couldn't just somewhere read each relevant mounted filesystem's errseq value. Whenever checkpointer notices before finishing a checkpoint that it has changed, do a crash restart. </blockquote> Hmmmm, that's an interesting idea, and it's about the only thing that would help us on older kernels. There's a wb_err in adress_space, but that's at inode level. Not sure if there's something at fs level. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 20:34:15 </code></pre> Hi, On 2018-04-09 13:25:54 -0700, Mark Dilger wrote: <blockquote> I was reading this thread up until now as meaning that the standby could receive corrupt WAL data and become corrupted. </blockquote> I don't see that as a real problem here. For one the problematic scenarios shouldn't readily apply, for another WAL is checksummed. There's the problem that a new basebackup would potentially become corrupted however. And similarly pg_rewind. Note that I'm not saying that we and/or linux shouldn't change anything. Just that the apocalypse isn't here. <blockquote> Your comment reads as if this is a problem isolated to whichever server has the problem, and will not get propagated to other servers. Am I reading that right? </blockquote> I think that's basically right. There's cases where corruption could get propagated, but they're not straightforward. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 20:37:31 </code></pre> Hi, On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote: <blockquote> Maybe. I'd certainly prefer automated recovery from an temporary I/O issues (like full disk on thin-provisioning) without the database crashing and restarting. But I'm not sure it's worth the effort. </blockquote> Oh, I agree on that one. But that's more a question of how we force the kernel's hand on allocating disk space. In most cases the kernel allocates the disk space immediately, even if delayed allocation is in effect. For the cases where that's not the case (if there are current ones, rather than just past bugs), we should be able to make sure that's not an issue by pre-zeroing the data and/or using fallocate. <hr /> <pre><code>From:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> Date:2018-04-09 20:43:03 </code></pre> On 04/09/2018 10:25 PM, Mark Dilger wrote: <blockquote> <blockquote> On Apr 9, 2018, at 12:13 PM, Andres Freund wrote: Hi, On 2018-04-09 15:02:11 -0400, Robert Haas wrote: <blockquote> I think the simplest technological solution to this problem is to rewrite the entire backend and all supporting processes to use O_DIRECT everywhere. To maintain adequate performance, we'll have to write a complete I/O scheduling system inside PostgreSQL. Also, since we'll now have to make shared_buffers much larger -- since we'll no longer be benefiting from the OS cache -- we'll need to replace the use of malloc() with an allocator that pulls from shared_buffers. Plus, as noted, we'll need to totally rearchitect several of our critical frontend tools. Let's freeze all other development for the next year while we work on that, and put out a notice that Linux is no longer a supported platform for any existing release. Before we do that, we might want to check whether fsync() actually writes the data to disk in a usable way even with O_DIRECT. If not, we should just de-support Linux entirely as a hopelessly broken and unsupportable platform. </blockquote> Let's lower the pitchforks a bit here. Obviously a grand rewrite is absurd, as is some of the proposed ways this is all supposed to work. But I think the case we're discussing is much closer to a near irresolvable corner case than anything else. We're talking about the storage layer returning an irresolvable error. You're hosed even if we report it properly. Yes, it'd be nice if we could report it reliably. But that doesn't change the fact that what we're doing is ensuring that data is safely fsynced unless storage fails, in which case it's not safely fsynced anyway. </blockquote> I was reading this thread up until now as meaning that the standby could receive corrupt WAL data and become corrupted. That seems a much bigger problem than merely having the master become corrupted in some unrecoverable way. It is a long standing expectation that serious hardware problems on the master can result in the master needing to be replaced. But there has not been an expectation that the one or more standby servers would be taken down along with the master, leaving all copies of the database unusable. If this bug corrupts the standby servers, too, then it is a whole different class of problem than the one folks have come to expect. Your comment reads as if this is a problem isolated to whichever server has the problem, and will not get propagated to other servers. Am I reading that right? Can anybody clarify this for non-core-hacker folks following along at home? </blockquote> That's a good question. I don't see any guarantee it'd be isolated to the master node. Consider this example: (0) checkpoint happens on the primary (1) a page gets modified, a full-page gets written to WAL (2) the page is written out to page cache (3) writeback of that page fails (and gets discarded) (4) we attempt to modify the page again, but we read the stale version (5) we modify the stale version, writing the change to WAL The standby will get the full-page, and then a WAL from the stale page version. That doesn't seem like a story with a happy end, I guess. But I might be easily missing some protection built into the WAL ... <hr /> <pre><code>From:Mark Dilger <hornschnorter(at)gmail(dot)com> Date:2018-04-09 20:55:29 </code></pre> <blockquote> On Apr 9, 2018, at 1:43 PM, Tomas Vondra wrote: On 04/09/2018 10:25 PM, Mark Dilger wrote: <blockquote> <blockquote> On Apr 9, 2018, at 12:13 PM, Andres Freund wrote: Hi, On 2018-04-09 15:02:11 -0400, Robert Haas wrote: <blockquote> I think the simplest technological solution to this problem is to rewrite the entire backend and all supporting processes to use O_DIRECT everywhere. To maintain adequate performance, we'll have to write a complete I/O scheduling system inside PostgreSQL. Also, since we'll now have to make shared_buffers much larger -- since we'll no longer be benefiting from the OS cache -- we'll need to replace the use of malloc() with an allocator that pulls from shared_buffers. Plus, as noted, we'll need to totally rearchitect several of our critical frontend tools. Let's freeze all other development for the next year while we work on that, and put out a notice that Linux is no longer a supported platform for any existing release. Before we do that, we might want to check whether fsync() actually writes the data to disk in a usable way even with O_DIRECT. If not, we should just de-support Linux entirely as a hopelessly broken and unsupportable platform. </blockquote> Let's lower the pitchforks a bit here. Obviously a grand rewrite is absurd, as is some of the proposed ways this is all supposed to work. But I think the case we're discussing is much closer to a near irresolvable corner case than anything else. We're talking about the storage layer returning an irresolvable error. You're hosed even if we report it properly. Yes, it'd be nice if we could report it reliably. But that doesn't change the fact that what we're doing is ensuring that data is safely fsynced unless storage fails, in which case it's not safely fsynced anyway. </blockquote> I was reading this thread up until now as meaning that the standby could receive corrupt WAL data and become corrupted. That seems a much bigger problem than merely having the master become corrupted in some unrecoverable way. It is a long standing expectation that serious hardware problems on the master can result in the master needing to be replaced. But there has not been an expectation that the one or more standby servers would be taken down along with the master, leaving all copies of the database unusable. If this bug corrupts the standby servers, too, then it is a whole different class of problem than the one folks have come to expect. Your comment reads as if this is a problem isolated to whichever server has the problem, and will not get propagated to other servers. Am I reading that right? Can anybody clarify this for non-core-hacker folks following along at home? </blockquote> That's a good question. I don't see any guarantee it'd be isolated to the master node. Consider this example: (0) checkpoint happens on the primary (1) a page gets modified, a full-page gets written to WAL (2) the page is written out to page cache (3) writeback of that page fails (and gets discarded) (4) we attempt to modify the page again, but we read the stale version (5) we modify the stale version, writing the change to WAL The standby will get the full-page, and then a WAL from the stale page version. That doesn't seem like a story with a happy end, I guess. But I might be easily missing some protection built into the WAL ... </blockquote> I can also imagine a master and standby that are similarly provisioned, and thus hit an out of disk error at around the same time, resulting in corruption on both, even if not the same corruption. When choosing to have one standby, or two standbys, or ten standbys, one needs to be able to assume a certain amount of statistical independence between failures on one server and failures on another. If they are tightly correlated dependent variables, then the conclusion that the probability of all nodes failing simultaneously is vanishingly small becomes invalid. <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-09 21:08:29 </code></pre> Hi, On 2018-04-09 13:55:29 -0700, Mark Dilger wrote: <blockquote> I can also imagine a master and standby that are similarly provisioned, and thus hit an out of disk error at around the same time, resulting in corruption on both, even if not the same corruption. </blockquote> I think it's a grave mistake conflating ENOSPC issues (which we should solve by making sure there's always enough space pre-allocated), with EIO type errors. The problem is different, the solution is different. <hr /> <pre><code>From:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> Date:2018-04-09 21:25:52 </code></pre> On 04/09/2018 11:08 PM, Andres Freund wrote: <blockquote> Hi, On 2018-04-09 13:55:29 -0700, Mark Dilger wrote: <blockquote> I can also imagine a master and standby that are similarly provisioned, and thus hit an out of disk error at around the same time, resulting in corruption on both, even if not the same corruption. </blockquote> I think it's a grave mistake conflating ENOSPC issues (which we should solve by making sure there's always enough space pre-allocated), with EIO type errors. The problem is different, the solution is different. </blockquote> In any case, that certainly does not count as data corruption spreading from the master to standby. <hr /> <pre><code>From:Mark Dilger <hornschnorter(at)gmail(dot)com> Date:2018-04-09 21:33:29 </code></pre> <blockquote> On Apr 9, 2018, at 2:25 PM, Tomas Vondra wrote: On 04/09/2018 11:08 PM, Andres Freund wrote: <blockquote> Hi, On 2018-04-09 13:55:29 -0700, Mark Dilger wrote: <blockquote> I can also imagine a master and standby that are similarly provisioned, and thus hit an out of disk error at around the same time, resulting in corruption on both, even if not the same corruption. </blockquote> I think it's a grave mistake conflating ENOSPC issues (which we should solve by making sure there's always enough space pre-allocated), with EIO type errors. The problem is different, the solution is different. </blockquote> </blockquote> I'm happy to take your word for that. <blockquote> In any case, that certainly does not count as data corruption spreading from the master to standby. </blockquote> Maybe not from the point of view of somebody looking at the code. But a user might see it differently. If the data being loaded into the master and getting replicated to the standby "causes" both to get corrupt, then it seems like corruption spreading. I put "causes" in quotes because there is some argument to be made about "correlation does not prove cause" and so forth, but it still feels like causation from an arms length perspective. If there is a pattern of standby servers tending to fail more often right around the time that the master fails, you'll have a hard time comforting users, "hey, it's not technically causation." If loading data into the master causes the master to hit ENOSPC, and replicating that data to the standby causes the standby to hit ENOSPC, and if the bug abound ENOSPC has not been fixed, then this looks like corruption spreading. I'm certainly planning on taking a hard look at the disk allocation on my standby servers right soon now. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-09 22:33:16 </code></pre> On Tue, Apr 10, 2018 at 2:22 AM, Anthony Iliopoulos wrote: <blockquote> On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote: <blockquote> Well, there seem to be kernels that seem to do exactly that already. At least that's how I understand what this thread says about FreeBSD and Illumos, for example. So it's not an entirely insane design, apparently. </blockquote> It is reasonable, but even FreeBSD has a big fat comment right there (since 2017), mentioning that there can be no recovery from EIO at the block layer and this needs to be done differently. No idea how an application running on top of either FreeBSD or Illumos would actually recover from this error (and clear it out), other than remounting the fs in order to force dropping of relevant pages. It does provide though indeed a persistent error indication that would allow Pg to simply reliably panic. But again this does not necessarily play well with other applications that may be using the filesystem reliably at the same time, and are now faced with EIO while their own writes succeed to be persisted. </blockquote> Right. For anyone interested, here is the change you mentioned, and an interesting one that came a bit earlier last year: <ul> <li><a href="https://reviews.freebsd.org/rS316941">https://reviews.freebsd.org/rS316941</a> -- drop buffers after device goes away</li> <li><a href="https://reviews.freebsd.org/rS326029">https://reviews.freebsd.org/rS326029</a> -- update comment about EIO contract</li> </ul> Retrying may well be futile, but at least future fsync() calls won't report success bogusly. There may of course be more space-efficient ways to represent that state as the comment implies, while never lying to the user -- perhaps involving filesystem level or (pinned) inode level errors that stop all writes until unmounted. Something tells me they won't resort to flakey fsync() error reporting. I wonder if anyone can tell us what Windows, AIX and HPUX do here. <blockquote> [1] <a href="https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf">https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf</a> </blockquote> Very interesting, thanks. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-10 00:32:20 </code></pre> On Tue, Apr 10, 2018 at 10:33 AM, Thomas Munro wrote: <blockquote> I wonder if anyone can tell us what Windows, AIX and HPUX do here. </blockquote> I created a wiki page to track what we know (or think we know) about fsync() on various operating systems: <a href="https://wiki.postgresql.org/wiki/Fsync_Errors">https://wiki.postgresql.org/wiki/Fsync_Errors</a> If anyone has more information or sees mistakes, please go ahead and edit it. <hr /> <pre><code>From:Andreas Karlsson <andreas(at)proxel(dot)se> Date:2018-04-10 00:41:10 </code></pre> On 04/09/2018 02:16 PM, Craig Ringer wrote: <blockquote> I'd like a middle ground where the kernel lets us register our interest and tells us if it lost something, without us having to keep eight million FDs open for some long period. "Tell us about anything that happens under pgdata/" or an inotify-style per-directory-registration option. I'd even say that's ideal. </blockquote> Could there be a risk of a race condition here where fsync incorrectly returns success before we get the notification of that something went wrong? <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-10 01:44:59 </code></pre> On 10 April 2018 at 03:59, Andres Freund wrote: <blockquote> On 2018-04-09 14:41:19 -0500, Justin Pryzby wrote: <blockquote> On Mon, Apr 09, 2018 at 09:31:56AM +0800, Craig Ringer wrote: <blockquote> You could make the argument that it's OK to forget if the entire file system goes away. But actually, why is that ok? </blockquote> I was going to say that it'd be okay to clear error flag on umount, since any opened files would prevent unmounting; but, then I realized we need to consider the case of close()ing all FDs then opening them later..in another process. On Mon, Apr 09, 2018 at 02:54:16PM +0200, Anthony Iliopoulos wrote: <blockquote> notification descriptor open, where the kernel would inject events related to writeback failures of files under watch (potentially enriched to contain info regarding the exact failed pages and the file offset they map to). </blockquote> For postgres that'd require backend processes to open() an file such that, following its close(), any writeback errors are "signalled" to the checkpointer process... </blockquote> I don't think that's as hard as some people argued in this thread. We could very well open a pipe in postmaster with the write end open in each subprocess, and the read end open only in checkpointer (and postmaster, but unused there). Whenever closing a file descriptor that was dirtied in the current process, send it over the pipe to the checkpointer. The checkpointer then can receive all those file descriptors (making sure it's not above the limit, fsync(), close() ing to make room if necessary). The biggest complication would presumably be to deduplicate the received filedescriptors for the same file, without loosing track of any errors. </blockquote> Yep. That'd be a cheaper way to do it, though it wouldn't work on Windows. Though we don't know how Windows behaves here at all yet. Prior discussion upthread had the checkpointer open()ing a file at the same time as a backend, before the backend writes to it. But passing the fd when the backend is done with it would be better. We'd need a way to dup() the fd and pass it back to a backend when it needed to reopen it sometimes, or just make sure to keep the oldest copy of the fd when a backend reopens multiple times, but that's no biggie. We'd still have to fsync() out early in the checkpointer if we ran out of space in our FD list, and initscripts would need to change our ulimit or we'd have to do it ourselves in the checkpointer. But neither seems insurmountable. FWIW, I agree that this is a corner case, but it's getting to be a pretty big corner with the spread of overcommitted, dedupliating SANs, cloud storage, etc. Not all I/O errors indicate permanent hardware faults, disk failures, etc, as I outlined earlier. I'm very curious to know what AWS EBS's error semantics are, and other cloud network block stores. (I posted on Amazon forums <a href="https://forums.aws.amazon.com/thread.jspa?threadID=279274&tstart=0">https://forums.aws.amazon.com/thread.jspa?threadID=279274&tstart=0</a> but nothing so far). I'm also not particularly inclined to trust that all file systems will always reliably reserve space without having some cases where they'll fail writeback on space exhaustion. So we don't need to panic and freak out, but it's worth looking at the direction the storage world is moving in, and whether this will become a bigger issue over time. <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-10 01:52:21 </code></pre> On Tue, Apr 10, 2018 at 1:44 PM, Craig Ringer wrote: <blockquote> On 10 April 2018 at 03:59, Andres Freund wrote: <blockquote> I don't think that's as hard as some people argued in this thread. We could very well open a pipe in postmaster with the write end open in each subprocess, and the read end open only in checkpointer (and postmaster, but unused there). Whenever closing a file descriptor that was dirtied in the current process, send it over the pipe to the checkpointer. The checkpointer then can receive all those file descriptors (making sure it's not above the limit, fsync(), close() ing to make room if necessary). The biggest complication would presumably be to deduplicate the received filedescriptors for the same file, without loosing track of any errors. </blockquote> Yep. That'd be a cheaper way to do it, though it wouldn't work on Windows. Though we don't know how Windows behaves here at all yet. Prior discussion upthread had the checkpointer open()ing a file at the same time as a backend, before the backend writes to it. But passing the fd when the backend is done with it would be better. </blockquote> How would that interlock with concurrent checkpoints? I can see how to make that work if the share-fd-or-fsync-now logic happens in smgrwrite() when called by FlushBuffer() while you hold io_in_progress, but not if you defer it to some random time later. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-10 01:54:30 </code></pre> On 10 April 2018 at 04:25, Mark Dilger wrote: <blockquote> I was reading this thread up until now as meaning that the standby could receive corrupt WAL data and become corrupted. </blockquote> Yes, it can, but not directly through the first error. What can happen is that we think a block got written when it didn't. If our in memory state diverges from our on disk state, we can make subsequent WAL writes based on that wrong information. But that's actually OK, since the standby will have replayed the original WAL correctly. I think the only time we'd run into trouble is if we evict the good (but not written out) data from s_b and the fs buffer cache, then later read in the old version of a block we failed to overwrite. Data checksums (if enabled) might catch it unless the write left the whole block stale. In that case we might generate a full page write with the stale block and propagate that over WAL to the standby. So I'd say standbys are relatively safe - very safe if the issue is caught promptly, and less so over time. But AFAICS WAL-based replication (physical or logical) is not a perfect defense for this. However, remember, if your storage system is free of any sort of overprovisioning, is on a non-network file system, and doesn't use multipath (or sets it up right) this issue is exceptionally unlikely to affect you. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-10 01:59:03 </code></pre> On 10 April 2018 at 04:37, Andres Freund wrote: <blockquote> Hi, On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote: <blockquote> Maybe. I'd certainly prefer automated recovery from an temporary I/O issues (like full disk on thin-provisioning) without the database crashing and restarting. But I'm not sure it's worth the effort. </blockquote> Oh, I agree on that one. But that's more a question of how we force the kernel's hand on allocating disk space. In most cases the kernel allocates the disk space immediately, even if delayed allocation is in effect. For the cases where that's not the case (if there are current ones, rather than just past bugs), we should be able to make sure that's not an issue by pre-zeroing the data and/or using fallocate. </blockquote> Nitpick: In most cases the kernel reserves disk space immediately, before returning from write(). NFS seems to be the main exception here. EXT4 and XFS don't allocate until later, it by performing actual writes to FS metadata, initializing disk blocks, etc. So we won't notice errors that are only detectable at actual time of allocation, like thin provisioning problems, until after write() returns and we face the same writeback issues. So I reckon you're safe from space-related issues if you're not on NFS (and whyyy would you do that?) and not thinly provisioned. I'm sure there are other corner cases, but I don't see any reason to expect space-exhaustion-related corruption problems on a sensible FS backed by a sensible block device. I haven't tested things like quotas, verified how reliable space reservation is under concurrency, etc as yet. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-10 02:00:59 </code></pre> On April 9, 2018 6:59:03 PM PDT, Craig Ringer wrote: <blockquote> On 10 April 2018 at 04:37, Andres Freund wrote: <blockquote> Hi, On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote: <blockquote> Maybe. I'd certainly prefer automated recovery from an temporary I/O issues (like full disk on thin-provisioning) without the database crashing and restarting. But I'm not sure it's worth the effort. </blockquote> Oh, I agree on that one. But that's more a question of how we force the kernel's hand on allocating disk space. In most cases the kernel allocates the disk space immediately, even if delayed allocation is in effect. For the cases where that's not the case (if there are current ones, rather than just past bugs), we should be able to make sure that's not an issue by pre-zeroing the data and/or using fallocate. </blockquote> Nitpick: In most cases the kernel reserves disk space immediately, before returning from write(). NFS seems to be the main exception here. EXT4 and XFS don't allocate until later, it by performing actual writes to FS metadata, initializing disk blocks, etc. So we won't notice errors that are only detectable at actual time of allocation, like thin provisioning problems, until after write() returns and we face the same writeback issues. So I reckon you're safe from space-related issues if you're not on NFS (and whyyy would you do that?) and not thinly provisioned. I'm sure there are other corner cases, but I don't see any reason to expect space-exhaustion-related corruption problems on a sensible FS backed by a sensible block device. I haven't tested things like quotas, verified how reliable space reservation is under concurrency, etc as yet. </blockquote> How's that not solved by pre zeroing and/or fallocate as I suggested above? <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-10 02:02:48 </code></pre> On 10 April 2018 at 08:41, Andreas Karlsson wrote: <blockquote> On 04/09/2018 02:16 PM, Craig Ringer wrote: <blockquote> I'd like a middle ground where the kernel lets us register our interest and tells us if it lost something, without us having to keep eight million FDs open for some long period. "Tell us about anything that happens under pgdata/" or an inotify-style per-directory-registration option. I'd even say that's ideal. </blockquote> Could there be a risk of a race condition here where fsync incorrectly returns success before we get the notification of that something went wrong? </blockquote> We'd examine the notification queue only once all our checkpoint fsync()s had succeeded, and before we updated the control file to advance the redo position. I'm intrigued by the suggestion upthread of using a kprobe or similar to achieve this. It's a horrifying unportable hack that'd make kernel people cry, and I don't know if we have any way to flush buffered probe data to be sure we really get the news in time, but it's a cool idea too. <hr /> <pre><code>From:Michael Paquier <michael(at)paquier(dot)xyz> Date:2018-04-10 05:04:13 </code></pre> On Mon, Apr 09, 2018 at 03:02:11PM -0400, Robert Haas wrote: <blockquote> Another consequence of this behavior that initdb -S is never reliable, so pg_rewind's use of it doesn't actually fix the problem it was intended to solve. It also means that initdb itself isn't crash-safe, since the data file changes are made by the backend but initdb itself is doing the fsyncs, and initdb has no way of knowing what files the backend is going to create and therefore can't -- even theoretically -- open them first. </blockquote> And pg_basebackup. And pg_dump. And pg_dumpall. Anything using initdb -S or fsync_pgdata would enter in those waters. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-10 05:37:19 </code></pre> On 10 April 2018 at 13:04, Michael Paquier wrote: <blockquote> On Mon, Apr 09, 2018 at 03:02:11PM -0400, Robert Haas wrote: <blockquote> Another consequence of this behavior that initdb -S is never reliable, so pg_rewind's use of it doesn't actually fix the problem it was intended to solve. It also means that initdb itself isn't crash-safe, since the data file changes are made by the backend but initdb itself is doing the fsyncs, and initdb has no way of knowing what files the backend is going to create and therefore can't -- even theoretically -- open them first. </blockquote> And pg_basebackup. And pg_dump. And pg_dumpall. Anything using initdb -S or fsync_pgdata would enter in those waters. </blockquote> ... but only if they hit an I/O error or they're on a FS that doesn't reserve space and hit ENOSPC. It still does 99% of the job. It still flushes all buffers to persistent storage and maintains write ordering. It may not detect and report failures to the user how we'd expect it to, yes, and that's not great. But it's hardly throw up our hands and give up territory either. Also, at least for initdb, we can make initdb fsync() its own files before close(). Annoying but hardly the end of the world. <hr /> <pre><code>From:Michael Paquier <michael(at)paquier(dot)xyz> Date:2018-04-10 06:10:21 </code></pre> On Tue, Apr 10, 2018 at 01:37:19PM +0800, Craig Ringer wrote: <blockquote> On 10 April 2018 at 13:04, Michael Paquier wrote: <blockquote> And pg_basebackup. And pg_dump. And pg_dumpall. Anything using initdb -S or fsync_pgdata would enter in those waters. </blockquote> ... but only if they hit an I/O error or they're on a FS that doesn't reserve space and hit ENOSPC. </blockquote> Sure. <blockquote> It still does 99% of the job. It still flushes all buffers to persistent storage and maintains write ordering. It may not detect and report failures to the user how we'd expect it to, yes, and that's not great. But it's hardly throw up our hands and give up territory either. Also, at least for initdb, we can make initdb fsync() its own files before close(). Annoying but hardly the end of the world. </blockquote> Well, I think that there is place for improving reporting of failure in file_utils.c for frontends, or at worst have an exit() for any kind of critical failures equivalent to a PANIC. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-10 12:15:15 </code></pre> On 10 April 2018 at 14:10, Michael Paquier wrote: <blockquote> Well, I think that there is place for improving reporting of failure in file_utils.c for frontends, or at worst have an exit() for any kind of critical failures equivalent to a PANIC. </blockquote> Yup. In the mean time, speaking of PANIC, here's the first cut patch to make Pg panic on fsync() failures. I need to do some closer review and testing, but it's presented here for anyone interested. I intentionally left some failures as ERROR not PANIC, where the entire operation is done as a unit, and an ERROR will cause us to retry the whole thing. For example, when we fsync() a temp file before we move it into place, there's no point panicing on failure, because we'll discard the temp file on ERROR and retry the whole thing. I've verified that it works as expected with some modifications to the test tool I've been using (pushed). The main downside is that if we panic in redo, we don't try again. We throw our toys and shut down. But arguably if we get the same I/O error again in redo, that's the right thing to do anyway, and quite likely safer than continuing to ERROR on checkpoints indefinitely. Patch attached. To be clear, this patch only deals with the issue of us retrying fsyncs when it turns out to be unsafe. This does NOT address any of the issues where we won't find out about writeback errors at all. AttachmentContent-TypeSize v1-0001-PANIC-when-we-detect-a-possible-fsync-I-O-error-i.patchtext/x-patch10.3 KB <hr /> <pre><code>From:Robert Haas <robertmhaas(at)gmail(dot)com> Date:2018-04-10 15:15:46 </code></pre> On Mon, Apr 9, 2018 at 3:13 PM, Andres Freund wrote: <blockquote> Let's lower the pitchforks a bit here. Obviously a grand rewrite is absurd, as is some of the proposed ways this is all supposed to work. But I think the case we're discussing is much closer to a near irresolvable corner case than anything else. </blockquote> Well, I admit that I wasn't entirely serious about that email, but I wasn't entirely not-serious either. If you can't find reliably find out whether the contents of the file on disk are the same as the contents that the kernel is giving you when you call read(), then you are going to have a heck of a time building a reliable system. If the kernel developers are determined to insist on these semantics (and, admittedly, I don't know whether that's the case - I've only read Anthony's remarks), then I don't really see what we can do except give up on buffered I/O (or on Linux). <blockquote> We're talking about the storage layer returning an irresolvable error. You're hosed even if we report it properly. Yes, it'd be nice if we could report it reliably. But that doesn't change the fact that what we're doing is ensuring that data is safely fsynced unless storage fails, in which case it's not safely fsynced anyway. </blockquote> I think that reliable error reporting is more than "nice" -- I think it's essential. The only argument for the current Linux behavior that has been so far advanced on this thread, at least as far as I can see, is that if it kept retrying the buffers forever, it would be pointless and might run the machine out of memory, so we might as well discard them. But previous comments have already illustrated that the kernel is not really up against a wall there -- it could put individual inodes into a permanent failure state when it discards their dirty data, as you suggested, or it could do what others have suggested, and what I think is better, which is to put the whole filesystem into a permanent failure state that can be cleared by remounting the FS. That could be done on an as-needed basis -- if the number of dirty buffers you're holding onto for some filesystem becomes too large, put the filesystem into infinite-fail mode and discard them all. That behavior would be pretty easy for administrators to understand and would resolve the entire problem here provided that no PostgreSQL processes survived the eventual remount. I also don't really know what we mean by an "unresolvable" error. If the drive is beyond all hope, then it doesn't really make sense to talk about whether the database stored on it is corrupt. In general we can't be sure that we'll even get an error - e.g. the system could be idle and the drive could be on fire. Maybe this is the case you meant by "it'd be nice if we could report it reliably". But at least in my experience, that's typically not what's going on. You get some I/O errors and so you remount the filesystem, or reboot, or rebuild the array, or ... something. And then the errors go away and, at that point, you want to run recovery and continue using your database. In this scenario, it matters quite a bit what the error reporting was like during the period when failures were occurring. In particular, if the database was allowed to think that it had successfully checkpointed when it didn't, you're going to start recovery from the wrong place. I'm going to shut up now because I'm telling you things that you obviously already know, but this doesn't sound like a "near irresolvable corner case". When the storage goes bonkers, either PostgreSQL and the kernel can interact in such a way that a checkpoint can succeed without all of the relevant data getting persisted, or they don't. It sounds like right now they do, and I'm not really clear that we have a reasonable idea how to fix that. It does not sound like a PANIC is sufficient. <hr /> <pre><code>From:Robert Haas <robertmhaas(at)gmail(dot)com> Date:2018-04-10 15:28:07 </code></pre> On Tue, Apr 10, 2018 at 1:37 AM, Craig Ringer wrote: <blockquote> ... but only if they hit an I/O error or they're on a FS that doesn't reserve space and hit ENOSPC. It still does 99% of the job. It still flushes all buffers to persistent storage and maintains write ordering. It may not detect and report failures to the user how we'd expect it to, yes, and that's not great. But it's hardly throw up our hands and give up territory either. Also, at least for initdb, we can make initdb fsync() its own files before close(). Annoying but hardly the end of the world. </blockquote> I think we'd need every child postgres process started by initdb to do that individually, which I suspect would slow down initdb quite a lot. Now admittedly for anybody other than a PostgreSQL developer that's only a minor issue, and our regression tests mostly run with fsync=off anyway. But I have a strong suspicion that our assumptions about how fsync() reports errors are baked into an awful lot of parts of the system, and by the time we get unbaking them I think it's going to be really surprising if we haven't done real harm to overall system performance. BTW, I took a look at the MariaDB source code to see whether they've got this problem too and it sure looks like they do. os_file_fsync_posix() retries the fsync in a loop with an 0.2 second sleep after each retry. It warns after 100 failures and fails an assertion after 1000 failures. It is hard to understand why they would have written the code this way unless they expect errors reported by fsync() to continue being reported until the underlying condition is corrected. But, it looks like they wouldn't have the problem that we do with trying to reopen files to fsync() them later -- I spot checked a few places where this code is invoked and in all of those it looks like the file is already expected to be open. <hr /> <pre><code>From:Anthony Iliopoulos <ailiop(at)altatus(dot)com> Date:2018-04-10 15:40:05 </code></pre> Hi Robert, On Tue, Apr 10, 2018 at 11:15:46AM -0400, Robert Haas wrote: <blockquote> On Mon, Apr 9, 2018 at 3:13 PM, Andres Freund wrote: <blockquote> Let's lower the pitchforks a bit here. Obviously a grand rewrite is absurd, as is some of the proposed ways this is all supposed to work. But I think the case we're discussing is much closer to a near irresolvable corner case than anything else. </blockquote> Well, I admit that I wasn't entirely serious about that email, but I wasn't entirely not-serious either. If you can't find reliably find out whether the contents of the file on disk are the same as the contents that the kernel is giving you when you call read(), then you are going to have a heck of a time building a reliable system. If the kernel developers are determined to insist on these semantics (and, admittedly, I don't know whether that's the case - I've only read Anthony's remarks), then I don't really see what we can do except give up on buffered I/O (or on Linux). </blockquote> I think it would be interesting to get in touch with some of the respective linux kernel maintainers and open up this topic for more detailed discussions. LSF/MM'18 is upcoming and it would have been the perfect opportunity but it's past the CFP deadline. It may still worth contacting the organizers to bring forward the issue, and see if there is a chance to have someone from Pg invited for further discussions. <hr /> <pre><code>From:Greg Stark <stark(at)mit(dot)edu> Date:2018-04-10 16:38:27 </code></pre> On 9 April 2018 at 11:50, Anthony Iliopoulos wrote: <blockquote> On Mon, Apr 09, 2018 at 09:45:40AM +0100, Greg Stark wrote: <blockquote> On 8 April 2018 at 22:47, Anthony Iliopoulos wrote: </blockquote> To make things a bit simpler, let us focus on EIO for the moment. The contract between the block layer and the filesystem layer is assumed to be that of, when an EIO is propagated up to the fs, then you may assume that all possibilities for recovering have been exhausted in lower layers of the stack. </blockquote> Well Postgres is using the filesystem. The interface between the block layer and the filesystem may indeed need to be more complex, I wouldn't know. But I don't think "all possibilities" is a very useful concept. Neither layer here is going to be perfect. They can only promise that all possibilities that have actually been implemented have been exhausted. And even among those only to the degree they can be done automatically within the engineering tradeoffs and constraints. There will always be cases like thin provisioned devices that an operator can expand, or degraded raid arrays that can be repaired after a long operation and so on. A network device can't be sure whether a remote server may eventually come back or not and have to be reconfigured by a human or system automation tool to point to the new server or new network configuration. <blockquote> Right. This implies though that apart from the kernel having to keep around the dirtied-but-unrecoverable pages for an unbounded time, that there's further an interface for obtaining the exact failed pages so that you can read them back. </blockquote> No, the interface we have is fsync which gives us that information with the granularity of a single file. The database could in theory recognize that fsync is not completing on a file and read that file back and write it to a new file. More likely we would implement a feature Oracle has of writing key files to multiple devices. But currently in practice that's not what would happen, what would happen would be a human would recognize that the database has stopped being able to commit and there are hardware errors in the log and would stop the database, take a backup, and restore onto a new working device. The current interface is that there's one error and then Postgres would pretty much have to say, "sorry, your database is corrupt and the data is gone, restore from your backups". Which is pretty dismal. <blockquote> There is a clear responsibility of the application to keep its buffers around until a successful fsync(). The kernels do report the error (albeit with all the complexities of dealing with the interface), at which point the application may not assume that the write()s where ever even buffered in the kernel page cache in the first place. </blockquote> Postgres cannot just store the entire database in RAM. It writes things to the filesystem all the time. It calls fsync only when it needs a write barrier to ensure consistency. That's only frequent on the transaction log to ensure it's flushed before data modifications and then periodically to checkpoint the data files. The amount of data written between checkpoints can be arbitrarily large and Postgres has no idea how much memory is available as filesystem buffers or how much i/o bandwidth is available or other memory pressure there is. What you're suggesting is that the application should have to babysit the filesystem buffer cache and reimplement all of it in user-space because the filesystem is free to throw away any data any time it chooses? The current interface to throw away filesystem buffer cache is unmount. It sounds like the kernel would like a more granular way to discard just part of a device which makes a lot of sense in the age of large network block devices. But I don't think just saying that the filesystem buffer cache is now something every application needs to re-implement in user-space really helps with that, they're going to have the same problems to solve. <hr /> <pre><code>From:Greg Stark <stark(at)mit(dot)edu> Date:2018-04-10 16:54:40 </code></pre> On 10 April 2018 at 02:59, Craig Ringer wrote: <blockquote> Nitpick: In most cases the kernel reserves disk space immediately, before returning from write(). NFS seems to be the main exception here. </blockquote> I'm kind of puzzled by this. Surely NFS servers store the data in the filesystem using write(2) or the in-kernel equivalent? So if the server is backed by a filesystem where write(2) preallocates space surely the NFS server must behave as if it'spreallocating as well? I would expect NFS to provide basically the same set of possible failures as the underlying filesystem (as long as you don't enable nosync of course). <hr /> <pre><code>From:"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> Date:2018-04-10 18:58:37 </code></pre> -hackers, I reached out to the Linux ext4 devs, here is tytso(at)mit(dot)edu response: """ Hi Joshua, This isn't actually an ext4 issue, but a long-standing VFS/MM issue. There are going to be multiple opinions about what the right thing to do. I'll try to give as unbiased a description as possible, but certainly some of this is going to be filtered by my own biases no matter how careful I can be. First of all, what storage devices will do when they hit an exception condition is quite non-deterministic. For example, the vast majority of SSD's are not power fail certified. What this means is that if they suffer a power drop while they are doing a GC, it is quite possible for data written six months ago to be lost as a result. The LBA could potentialy be far, far away from any LBA's that were recently written, and there could have been multiple CACHE FLUSH operations in the since the LBA in question was last written six months ago. No matter; for a consumer-grade SSD, it's possible for that LBA to be trashed after an unexpected power drop. Which is why after a while, one can get quite paranoid and assume that the only way you can guarantee data robustness is to store multiple copies and/or use erasure encoding, with some of the copies or shards written to geographically diverse data centers. Secondly, I think it's fair to say that the vast majority of the companies who require data robustness, and are either willing to pay $$$ to an enterprise distro company like Red Hat, or command a large enough paying customer base that they can afford to dictate terms to an enterprise distro, or hire a consultant such as Christoph, or have their own staffed Linux kernel teams, have tended to use O_DIRECT. So for better or for worse, there has not been as much investment in buffered I/O and data robustness in the face of exception handling of storage devices. Next, the reason why fsync() has the behaviour that it does is one ofhe the most common cases of I/O storage errors in buffered use cases, certainly as seen by the community distros, is the user who pulls out USB stick while it is in use. In that case, if there are dirtied pages in the page cache, the question is what can you do? Sooner or later the writes will time out, and if you leave the pages dirty, then it effectively becomes a permanent memory leak. You can't unmount the file system --- that requires writing out all of the pages such that the dirty bit is turned off. And if you don't clear the dirty bit on an I/O error, then they can never be cleaned. You can't even re-insert the USB stick; the re-inserted USB stick will get a new block device. Worse, when the USB stick was pulled, it will have suffered a power drop, and see above about what could happen after a power drop for non-power fail certified flash devices --- it goes double for the cheap sh*t USB sticks found in the checkout aisle of Micro Center. So this is the explanation for why Linux handles I/O errors by clearing the dirty bit after reporting the error up to user space. And why there is not eagerness to solve the problem simply by "don't clear the dirty bit". For every one Postgres installation that might have a better recover after an I/O error, there's probably a thousand clueless Fedora and Ubuntu users who will have a much worse user experience after a USB stick pull happens. I can think of things that could be done --- for example, it could be switchable on a per-block device basis (or maybe a per-mount basis) whether or not the dirty bit gets cleared after the error is reported to userspace. And perhaps there could be a new unmount flag that causes all dirty pages to be wiped out, which could be used to recover after a permanent loss of the block device. But the question is who is going to invest the time to make these changes? If there is a company who is willing to pay to comission this work, it's almost certainly soluble. Or if a company which has a kernel on staff is willing to direct an engineer to work on it, it certainly could be solved. But again, of the companies who have client code where we care about robustness and proper handling of failed disk drives, and which have a kernel team on staff, pretty much all of the ones I can think of (e.g., Oracle, Google, etc.) use O_DIRECT and they don't try to make buffered writes and error reporting via fsync(2) work well. In general these companies want low-level control over buffer cache eviction algorithms, which drives them towards the design decision of effectively implementing the page cache in userspace, and using O_DIRECT reads/writes. If you are aware of a company who is willing to pay to have a new kernel feature implemented to meet your needs, we might be able to refer you to a company or a consultant who might be able to do that work. Let me know off-line if that's the case... <pre><code>- Ted </code></pre> """ <hr /> <pre><code>From:"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> Date:2018-04-10 19:51:01 </code></pre> -hackers, The thread is picking up over on the ext4 list. They don't update their archives as often as we do, so I can't link to the discussion. What would be the preferred method of sharing the info? Thanks, <hr /> <pre><code>From:"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> Date:2018-04-10 20:57:34 </code></pre> On 04/10/2018 12:51 PM, Joshua D. Drake wrote: <blockquote> -hackers, The thread is picking up over on the ext4 list. They don't update their archives as often as we do, so I can't link to the discussion. What would be the preferred method of sharing the info? </blockquote> Thanks to Anthony for this link: <a href="http://lists.openwall.net/linux-ext4/2018/04/10/33">http://lists.openwall.net/linux-ext4/2018/04/10/33</a> It isn't quite real time but it keeps things close enough. <hr /> <pre><code>From:Jonathan Corbet <corbet(at)lwn(dot)net> Date:2018-04-11 12:05:27 </code></pre> On Tue, 10 Apr 2018 17:40:05 +0200 Anthony Iliopoulos wrote: <blockquote> LSF/MM'18 is upcoming and it would have been the perfect opportunity but it's past the CFP deadline. It may still worth contacting the organizers to bring forward the issue, and see if there is a chance to have someone from Pg invited for further discussions. </blockquote> FWIW, it is my current intention to be sure that the development community is at least aware of the issue by the time LSFMM starts. The event is April 23-25 in Park City, Utah. I bet that room could be found for somebody from the postgresql community, should there be somebody who would like to represent the group on this issue. Let me know if an introduction or advocacy from my direction would be helpful. <hr /> <pre><code>From:Greg Stark <stark(at)mit(dot)edu> Date:2018-04-11 12:23:49 </code></pre> On 10 April 2018 at 19:58, Joshua D. Drake wrote: <blockquote> You can't unmount the file system --- that requires writing out all of the pages such that the dirty bit is turned off. </blockquote> I always wondered why Linux didn't implement umount -f. It's been in BSD since forever and it's a major annoyance that it's missing in Linux. Even without leaking memory it still leaks other resources, causes confusion and awkward workarounds in UI and automation software. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-11 14:29:09 </code></pre> Hi, On 2018-04-11 06:05:27 -0600, Jonathan Corbet wrote: <blockquote> The event is April 23-25 in Park City, Utah. I bet that room could be found for somebody from the postgresql community, should there be somebody who would like to represent the group on this issue. Let me know if an introduction or advocacy from my direction would be helpful. </blockquote> If that room can be found, I might be able to make it. Being in SF, I'm probably the physically closest PG dev involved in the discussion. Thanks for chiming in, <hr /> <pre><code>From:Jonathan Corbet <corbet(at)lwn(dot)net> Date:2018-04-11 14:40:31 </code></pre> On Wed, 11 Apr 2018 07:29:09 -0700 Andres Freund wrote: <blockquote> If that room can be found, I might be able to make it. Being in SF, I'm probably the physically closest PG dev involved in the discussion. </blockquote> OK, I've dropped the PC a note; hopefully you'll be hearing from them. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-17 21:19:53 </code></pre> On Tue, Apr 10, 2018 at 05:54:40PM +0100, Greg Stark wrote: <blockquote> On 10 April 2018 at 02:59, Craig Ringer wrote: <blockquote> Nitpick: In most cases the kernel reserves disk space immediately, before returning from write(). NFS seems to be the main exception here. </blockquote> I'm kind of puzzled by this. Surely NFS servers store the data in the filesystem using write(2) or the in-kernel equivalent? So if the server is backed by a filesystem where write(2) preallocates space surely the NFS server must behave as if it'spreallocating as well? I would expect NFS to provide basically the same set of possible failures as the underlying filesystem (as long as you don't enable nosync of course). </blockquote> I don't think the write is sent to the NFS at the time of the write, so while the NFS side would reserve the space, it might get the write request until after we return write success to the process. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-17 21:29:17 </code></pre> On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote: <blockquote> On 04/09/2018 12:29 AM, Bruce Momjian wrote: <blockquote> An crazy idea would be to have a daemon that checks the logs and stops Postgres when it seems something wrong. </blockquote> That doesn't seem like a very practical way. It's better than nothing, of course, but I wonder how would that work with containers (where I think you may not have access to the kernel log at all). Also, I'm pretty sure the messages do change based on kernel version (and possibly filesystem) so parsing it reliably seems rather difficult. And we probably don't want to PANIC after I/O error on an unrelated device, so we'd need to understand which devices are related to PostgreSQL. </blockquote> My more-considered crazy idea is to have a postgresql.conf setting like archive_command that allows the administrator to specify a command that will be run after fsync but before the checkpoint is marked as complete. While we can have write flush errors before fsync and never see the errors during fsync, we will not have write flush errors after fsync that are associated with previous writes. The script should check for I/O or space-exhaustion errors and return false in that case, in which case we can stop and maybe stop and crash recover. We could have an exit of 1 do the former, and an exit of 2 do the later. Also, if we are relying on WAL, we have to make sure WAL is actually safe with fsync, and I am betting only the O_DIRECT methods actually are safe: <pre><code> #wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync --> # fdatasync (default on Linux) --> # fsync --> # fsync_writethrough # open_sync </code></pre> I am betting the marked wal_sync_method methods are not safe since there is time between the write and fsync. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-17 21:32:45 </code></pre> On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote: <blockquote> On 04/09/2018 12:29 AM, Bruce Momjian wrote: <blockquote> An crazy idea would be to have a daemon that checks the logs and stops Postgres when it seems something wrong. </blockquote> That doesn't seem like a very practical way. It's better than nothing, of course, but I wonder how would that work with containers (where I think you may not have access to the kernel log at all). Also, I'm pretty sure the messages do change based on kernel version (and possibly filesystem) so parsing it reliably seems rather difficult. And we probably don't want to PANIC after I/O error on an unrelated device, so we'd need to understand which devices are related to PostgreSQL. </blockquote> Replying to your specific case, I am not sure how we would use a script to check for I/O errors/space-exhaustion if the postgres user doesn't have access to it. Does O_DIRECT work in such container cases? <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-17 21:34:53 </code></pre> On 2018-04-17 17:29:17 -0400, Bruce Momjian wrote: <blockquote> Also, if we are relying on WAL, we have to make sure WAL is actually safe with fsync, and I am betting only the O_DIRECT methods actually are safe: <pre><code> > #wal_sync_method = fsync # the default is the first option > # supported by the operating system: > # open_datasync > --> # fdatasync (default on Linux) > --> # fsync > --> # fsync_writethrough > # open_sync </code></pre> I am betting the marked wal_sync_method methods are not safe since there is time between the write and fsync. </blockquote> Hm? That's not really the issue though? One issue is that retries are not necessarily safe in buffered IO, the other that fsync might not report an error if the fd was closed and opened. O_DIRECT is only used if wal archiving or streaming isn't used, which makes it pretty useless anyway. <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-17 21:41:42 </code></pre> On 2018-04-17 17:32:45 -0400, Bruce Momjian wrote: <blockquote> On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote: <blockquote> That doesn't seem like a very practical way. It's better than nothing, of course, but I wonder how would that work with containers (where I think you may not have access to the kernel log at all). Also, I'm pretty sure the messages do change based on kernel version (and possibly filesystem) so parsing it reliably seems rather difficult. And we probably don't want to PANIC after I/O error on an unrelated device, so we'd need to understand which devices are related to PostgreSQL. </blockquote> </blockquote> You can certainly have access to the kernel log in containers. I'd assume such a script wouldn't check various system logs but instead tail /dev/kmsg or such. Otherwise the variance between installations would be too big. There's not that many different type of error messages and they don't change that often. If we'd just detect error for the most common FSs we'd probably be good. Detecting a few general storage layer message wouldn't be that hard either, most things have been unified over the last ~8-10 years. <blockquote> Replying to your specific case, I am not sure how we would use a script to check for I/O errors/space-exhaustion if the postgres user doesn't have access to it. </blockquote> Not sure what you mean? Space exhaustiion can be checked when allocating space, FWIW. We'd just need to use posix_fallocate et al. <blockquote> Does O_DIRECT work in such container cases? </blockquote> Yes. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-17 21:49:42 </code></pre> On Mon, Apr 9, 2018 at 12:25:33PM -0700, Peter Geoghegan wrote: <blockquote> On Mon, Apr 9, 2018 at 12:13 PM, Andres Freund wrote: <blockquote> Let's lower the pitchforks a bit here. Obviously a grand rewrite is absurd, as is some of the proposed ways this is all supposed to work. But I think the case we're discussing is much closer to a near irresolvable corner case than anything else. </blockquote> +1 <blockquote> We're talking about the storage layer returning an irresolvable error. You're hosed even if we report it properly. Yes, it'd be nice if we could report it reliably. But that doesn't change the fact that what we're doing is ensuring that data is safely fsynced unless storage fails, in which case it's not safely fsynced anyway. </blockquote> Right. We seem to be implicitly assuming that there is a big difference between a problem in the storage layer that we could in principle detect, but don't, and any other problem in the storage layer. I've read articles claiming that technologies like SMART are not really reliable in a practical sense [1], so it seems to me that there is reason to doubt that this gap is all that big. That said, I suspect that the problems with running out of disk space are serious practical problems. I have personally scoffed at stories involving Postgres databases corruption that gets attributed to running out of disk space. Looks like I was dead wrong. </blockquote> Yes, I think we need to look at user expectations here. If the device has a hardware write error, it is true that it is good to detect it, and it might be permanent or temporary, e.g. NAS/NFS. The longer the error persists, the more likely the user will expect corruption. However, right now, any length outage could cause corruption, and it will not be reported in all cases. Running out of disk space is also something you don't expect to corrupt your database --- you expect it to only prevent future writes. It seems NAS/NFS and any thin provisioned storage will have this problem, and again, not always reported. So, our initial action might just be to educate users that write errors can cause silent corruption, and out-of-space errors on NAS/NFS and any thin provisioned storage can cause corruption. Kernel logs (not just Postgres logs) should be monitored for these issues and fail-over/recovering might be necessary. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-18 09:52:22 </code></pre> On Tue, Apr 17, 2018 at 02:34:53PM -0700, Andres Freund wrote: <blockquote> On 2018-04-17 17:29:17 -0400, Bruce Momjian wrote: <blockquote> Also, if we are relying on WAL, we have to make sure WAL is actually safe with fsync, and I am betting only the O_DIRECT methods actually are safe: <pre><code>> > #wal_sync_method = fsync # the default is the first option > > # supported by the operating system: > > # open_datasync > > --> # fdatasync (default on Linux) > > --> # fsync > > --> # fsync_writethrough > > # open_sync </code></pre> I am betting the marked wal_sync_method methods are not safe since there is time between the write and fsync. </blockquote> Hm? That's not really the issue though? One issue is that retries are not necessarily safe in buffered IO, the other that fsync might not report an error if the fd was closed and opened. </blockquote> Well, we have have been focusing on the delay between backend or checkpoint writes and checkpoint fsyncs. My point is that we have the same problem in doing a write, then fsync for the WAL. Yes, the delay is much shorter, but the issue still exists. I realize that newer Linux kernels will not have the problem since the file descriptor remains open, but the problem exists with older/common linux kernels. <blockquote> O_DIRECT is only used if wal archiving or streaming isn't used, which makes it pretty useless anyway. </blockquote> Uh, as doesn't 'open_datasync' and 'open_sync' fsync as part of the write, meaning we can't lose the error report like we can with the others? <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-18 10:04:30 </code></pre> On 18 April 2018 at 05:19, Bruce Momjian wrote: <blockquote> On Tue, Apr 10, 2018 at 05:54:40PM +0100, Greg Stark wrote: <blockquote> On 10 April 2018 at 02:59, Craig Ringer wrote: <blockquote> Nitpick: In most cases the kernel reserves disk space immediately, before returning from write(). NFS seems to be the main exception here. </blockquote> I'm kind of puzzled by this. Surely NFS servers store the data in the filesystem using write(2) or the in-kernel equivalent? So if the server is backed by a filesystem where write(2) preallocates space surely the NFS server must behave as if it'spreallocating as well? I would expect NFS to provide basically the same set of possible failures as the underlying filesystem (as long as you don't enable nosync of course). </blockquote> I don't think the write is sent to the NFS at the time of the write, so while the NFS side would reserve the space, it might get the write request until after we return write success to the process. </blockquote> It should be sent if you're using sync mode. From my reading of the docs, if you're using async mode you're already open to so many potential corruptions you might as well not bother. I need to look into this more re NFS and expand the tests I have to cover that properly. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-18 10:19:28 </code></pre> On 10 April 2018 at 20:15, Craig Ringer wrote: <blockquote> On 10 April 2018 at 14:10, Michael Paquier wrote: <blockquote> Well, I think that there is place for improving reporting of failure in file_utils.c for frontends, or at worst have an exit() for any kind of critical failures equivalent to a PANIC. </blockquote> Yup. In the mean time, speaking of PANIC, here's the first cut patch to make Pg panic on fsync() failures. I need to do some closer review and testing, but it's presented here for anyone interested. I intentionally left some failures as ERROR not PANIC, where the entire operation is done as a unit, and an ERROR will cause us to retry the whole thing. For example, when we fsync() a temp file before we move it into place, there's no point panicing on failure, because we'll discard the temp file on ERROR and retry the whole thing. I've verified that it works as expected with some modifications to the test tool I've been using (pushed). The main downside is that if we panic in redo, we don't try again. We throw our toys and shut down. But arguably if we get the same I/O error again in redo, that's the right thing to do anyway, and quite likely safer than continuing to ERROR on checkpoints indefinitely. Patch attached. To be clear, this patch only deals with the issue of us retrying fsyncs when it turns out to be unsafe. This does NOT address any of the issues where we won't find out about writeback errors at all. </blockquote> Thinking about this some more, it'll definitely need a GUC to force it to continue despite a potential hazard. Otherwise we go backwards from the status quo if we're in a position where uptime is vital and correctness problems can be tolerated or repaired later. Kind of like zero_damaged_pages, we'll need some sort of continue_after_fsync_errors . Without that, we'll panic once, enter redo, and if the problem persists we'll panic in redo and exit the startup process. That's not going to help users. I'll amend the patch accordingly as time permits. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-18 11:46:15 </code></pre> On Wed, Apr 18, 2018 at 06:04:30PM +0800, Craig Ringer wrote: <blockquote> On 18 April 2018 at 05:19, Bruce Momjian wrote: <blockquote> On Tue, Apr 10, 2018 at 05:54:40PM +0100, Greg Stark wrote: <blockquote> On 10 April 2018 at 02:59, Craig Ringer wrote: <blockquote> Nitpick: In most cases the kernel reserves disk space immediately, before returning from write(). NFS seems to be the main exception here. </blockquote> I'm kind of puzzled by this. Surely NFS servers store the data in the filesystem using write(2) or the in-kernel equivalent? So if the server is backed by a filesystem where write(2) preallocates space surely the NFS server must behave as if it'spreallocating as well? I would expect NFS to provide basically the same set of possible failures as the underlying filesystem (as long as you don't enable nosync of course). </blockquote> I don't think the write is sent to the NFS at the time of the write, so while the NFS side would reserve the space, it might get the write request until after we return write success to the process. </blockquote> It should be sent if you're using sync mode. <blockquote> From my reading of the docs, if you're using async mode you're already open to so many potential corruptions you might as well not bother. </blockquote> I need to look into this more re NFS and expand the tests I have to cover that properly. </blockquote> So, if sync mode passes the write to NFS, and NFS pre-reserves write space, and throws an error on reservation failure, that means that NFS will not corrupt a cluster on out-of-space errors. So, what about thin provisioning? I can understand sharing free space among file systems, but once a write arrives I assume it reserves the space. Is the problem that many thin provisioning systems don't have a sync mode, so you can't force the write to appear on the device before an fsync? <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-18 11:56:57 </code></pre> On Tue, Apr 17, 2018 at 02:41:42PM -0700, Andres Freund wrote: <blockquote> On 2018-04-17 17:32:45 -0400, Bruce Momjian wrote: <blockquote> On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote: <blockquote> That doesn't seem like a very practical way. It's better than nothing, of course, but I wonder how would that work with containers (where I think you may not have access to the kernel log at all). Also, I'm pretty sure the messages do change based on kernel version (and possibly filesystem) so parsing it reliably seems rather difficult. And we probably don't want to PANIC after I/O error on an unrelated device, so we'd need to understand which devices are related to PostgreSQL. </blockquote> </blockquote> You can certainly have access to the kernel log in containers. I'd assume such a script wouldn't check various system logs but instead tail /dev/kmsg or such. Otherwise the variance between installations would be too big. </blockquote> I was thinking 'dmesg', but the result is similar. <blockquote> There's not that many different type of error messages and they don't change that often. If we'd just detect error for the most common FSs we'd probably be good. Detecting a few general storage layer message wouldn't be that hard either, most things have been unified over the last ~8-10 years. </blockquote> It is hard to know exactly what the message format should be for each operating system because it is hard to generate them on demand, and we would need to filter based on Postgres devices. The other issue is that once you see a message during a checkpoint and exit, you don't want to see that message again after the problem has been fixed and the server restarted. The simplest solution is to save the output of the last check and look for only new entries. I am attaching a script I run every 15 minutes from cron that emails me any unexpected kernel messages. I am thinking we would need a contrib module with sample scripts for various operating systems. <blockquote> <blockquote> Replying to your specific case, I am not sure how we would use a script to check for I/O errors/space-exhaustion if the postgres user doesn't have access to it. </blockquote> Not sure what you mean? Space exhaustiion can be checked when allocating space, FWIW. We'd just need to use posix_fallocate et al. </blockquote> I was asking about cases where permissions prevent viewing of kernel messages. I think you can view them in containers, but in virtual machines you might not have access to the host operating system's kernel messages, and that might be where they are. <pre><code> AttachmentContent-TypeSize dmesg_checktext/plain574 bytes </code></pre> <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-18 12:45:53 </code></pre> wrOn 18 April 2018 at 19:46, Bruce Momjian wrote: <blockquote> So, if sync mode passes the write to NFS, and NFS pre-reserves write space, and throws an error on reservation failure, that means that NFS will not corrupt a cluster on out-of-space errors. </blockquote> Yeah. I need to verify in a concrete test case. The thing is that write() is allowed to be asynchronous anyway. Most file systems choose to implement eager reservation of space, but it's not mandated. AFAICS that's largely a historical accident to keep applications happy, because FSes used to allocate the space at write() time too, and when they moved to delayed allocations, apps tended to break too easily unless they at least reserved space. NFS would have to do a round-trip on write() to reserve space. The Linux man pages (<a href="http://man7.org/linux/man-pages/man2/write.2.html">http://man7.org/linux/man-pages/man2/write.2.html</a>) say: <blockquote> A successful return from write() does not make any guarantee that data has been committed to disk. On some filesystems, including NFS, it does not even guarantee that space has successfully been reserved for the data. In this case, some errors might be delayed until a future write(2), fsync(2), or even close(2). The only way to be sure is to call fsync(2) after you are done writing all your data. </blockquote> ... and I'm inclined to believe it when it refuses to make guarantees. Especially lately. <blockquote> So, what about thin provisioning? I can understand sharing free space among file systems </blockquote> Most thin provisioning is done at the block level, not file system level. So the FS is usually unaware it's on a thin-provisioned volume. Usually the whole kernel is unaware, because the thin provisioning is done on the SAN end or by a hypervisor. But the same sort of thing may be done via LVM - see lvmthin. For example, you may make 100 different 1TB ext4 FSes, each on 1TB iSCSI volumes backed by SAN with a total of 50TB of concrete physical capacity. The SAN is doing block mapping and only allocating storage chunks to a given volume when the FS has written blocks to every previous free block in the previous storage chunk. It may also do things like block de-duplication, compression of storage chunks that aren't written to for a while, etc. The idea is that when the SAN's actual physically allocate storage gets to 40TB it starts telling you to go buy another rack of storage so you don't run out. You don't have to resize volumes, resize file systems, etc. All the storage space admin is centralized on the SAN and storage team, and your sysadmins, DBAs and app devs are none the wiser. You buy storage when you need it, not when the DBA demands they need a 200% free space margin just in case. Whether or not you agree with this philosophy or think it's sensible is kind of moot, because it's an extremely widespread model, and servers you work on may well be backed by thin provisioned storage even if you don't know it. Think of it as a bit like VM overcommit, for storage. You can malloc() as much memory as you like and everything's fine until you try to actually use it. Then you go to dirty a page, no free pages are available, and boom. The thing is, the SAN (or LVM) doesn't have any idea about the FS's internal in-memory free space counter and its space reservations. Nor does it understand any FS metadata. All it cares about is "has this LBA ever been written to by the FS?". If so, it must make sure backing storage for it exists. If not, it won't bother. Most FSes only touch the blocks on dirty writeback, or sometimes lazily as part of delayed allocation. So if your SAN is running out of space and there's 100MB free, each of your 100 FSes may have decremented its freelist by 2MB and be happily promising more space to apps on write() because, well, as far as they know they're only 50% full. When they all do dirty writeback and flush to storage, kaboom, there's nowhere to put some of the data. I don't know if posix_fallocate is a sufficient safeguard either. You'd have to actually force writes to each page through to the backing storage to know for sure the space existed. Yes, the docs say <blockquote> After a successful call to posix_fallocate(), subsequent writes to bytes in the specified range are guaranteed not to fail because of lack of disk space. </blockquote> ... but they're speaking from the filesystem's perspective. If the FS doesn't dirty and flush the actual blocks, a thin provisioned storage system won't know. It's reasonable enough to throw up our hands in this case and say "your setup is crazy, you're breaking the rules, don't do that". The truth is they AREN'T breaking the rules, but we can disclaim support for such configurations anyway. After all, we tell people not to use Linux's VM overcommit too. How's that working for you? I see it enabled on the great majority of systems I work with, and some people are very reluctant to turn it off because they don't want to have to add swap. If someone has a 50TB SAN and wants to allow for unpredictable space use expansion between various volumes, and we say "you can't do that, go buy a 100TB SAN instead" ... that's not going to go down too well either. Often we can actually say "make sure the 5TB volume PostgreSQL is using is eagerly provisioned, and expand it at need using online resize if required. We don't care about the rest of the SAN.". I guarantee you that when you create a 100GB EBS volume on AWS EC2, you don't get 100GB of storage preallocated. AWS are probably pretty good about not running out of backing store, though. There are file systems optimised for thin provisioning, etc, too. But that's more commonly done by having them do things like zero deallocated space so the thin provisioning system knows it can return it to the free pool, and now things like DISCARD provide much of that signalling in a standard way. <hr /> <pre><code>From:Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> Date:2018-04-18 23:31:50 </code></pre> On 19/04/18 00:45, Craig Ringer wrote: <blockquote> I guarantee you that when you create a 100GB EBS volume on AWS EC2, you don't get 100GB of storage preallocated. AWS are probably pretty good about not running out of backing store, though. </blockquote> Some db folks (used to anyway) advise dd'ing to your freshly attached devices on AWS (for performance mainly IIRC), but that would help prevent some failure scenarios for any thin provisioned storage (but probably really annoy the admins' thereof). <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-19 00:44:33 </code></pre> On 19 April 2018 at 07:31, Mark Kirkwood wrote: <blockquote> On 19/04/18 00:45, Craig Ringer wrote: <blockquote> I guarantee you that when you create a 100GB EBS volume on AWS EC2, you don't get 100GB of storage preallocated. AWS are probably pretty good about not running out of backing store, though. </blockquote> Some db folks (used to anyway) advise dd'ing to your freshly attached devices on AWS (for performance mainly IIRC), but that would help prevent some failure scenarios for any thin provisioned storage (but probably really annoy the admins' thereof). </blockquote> This still makes a lot of sense on AWS EBS, particularly when using a volume created from a non-empty snapshot. Performance of S3-snapshot based EBS volumes is spectacularly awful, since they're copy-on-read. Reading the whole volume helps a lot. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-20 20:49:08 </code></pre> On Wed, Apr 18, 2018 at 08:45:53PM +0800, Craig Ringer wrote: <blockquote> wrOn 18 April 2018 at 19:46, Bruce Momjian wrote: <blockquote> So, if sync mode passes the write to NFS, and NFS pre-reserves write space, and throws an error on reservation failure, that means that NFS will not corrupt a cluster on out-of-space errors. </blockquote> Yeah. I need to verify in a concrete test case. </blockquote> Thanks. <blockquote> The thing is that write() is allowed to be asynchronous anyway. Most file systems choose to implement eager reservation of space, but it's not mandated. AFAICS that's largely a historical accident to keep applications happy, because FSes used to allocate the space at write() time too, and when they moved to delayed allocations, apps tended to break too easily unless they at least reserved space. NFS would have to do a round-trip on write() to reserve space. The Linux man pages (<a href="http://man7.org/linux/man-pages/man2/write.2.html">http://man7.org/linux/man-pages/man2/write.2.html</a>) say: " A successful return from write() does not make any guarantee that data has been committed to disk. On some filesystems, including NFS, it does not even guarantee that space has successfully been reserved for the data. In this case, some errors might be delayed until a future write(2), fsync(2), or even close(2). The only way to be sure is to call fsync(2) after you are done writing all your data. " ... and I'm inclined to believe it when it refuses to make guarantees. Especially lately. </blockquote> Uh, even calling fsync after write isn't 100% safe since the kernel could have flushed the dirty pages to storage, and failed, and the fsync would later succeed. I realize newer kernels have that fixed for files open during that operation, but that is the minority of installs. <blockquote> The idea is that when the SAN's actual physically allocate storage gets to 40TB it starts telling you to go buy another rack of storage so you don't run out. You don't have to resize volumes, resize file systems, etc. All the storage space admin is centralized on the SAN and storage team, and your sysadmins, DBAs and app devs are none the wiser. You buy storage when you need it, not when the DBA demands they need a 200% free space margin just in case. Whether or not you agree with this philosophy or think it's sensible is kind of moot, because it's an extremely widespread model, and servers you work on may well be backed by thin provisioned storage even if you don't know it. Most FSes only touch the blocks on dirty writeback, or sometimes lazily as part of delayed allocation. So if your SAN is running out of space and there's 100MB free, each of your 100 FSes may have decremented its freelist by 2MB and be happily promising more space to apps on write() because, well, as far as they know they're only 50% full. When they all do dirty writeback and flush to storage, kaboom, there's nowhere to put some of the data. </blockquote> I see what you are saying --- that the kernel is reserving the write space from its free space, but the free space doesn't all exist. I am not sure how we can tell people to make sure the file system free space is real. <blockquote> You'd have to actually force writes to each page through to the backing storage to know for sure the space existed. Yes, the docs say " After a successful call to posix_fallocate(), subsequent writes to bytes in the specified range are guaranteed not to fail because of lack of disk space. " ... but they're speaking from the filesystem's perspective. If the FS doesn't dirty and flush the actual blocks, a thin provisioned storage system won't know. </blockquote> Frankly, in what cases will a write fail for lack of free space? It could be a new WAL file (not recycled), or a pages added to the end of the table. Is that it? It doesn't sound too terrible. If we can eliminate the corruption due to free space exxhaustion, it would be a big step forward. The next most common failure would be temporary storage failure or storage communication failure. Permanent storage failure is "game over" so we don't need to worry about that. <hr /> <pre><code>From:Gasper Zejn <zejn(at)owca(dot)info> Date:2018-04-21 19:21:39 </code></pre> Just for the record, I tried the test case with ZFS on Ubuntu 17.10 host with ZFS on Linux 0.6.5.11. ZFS does not swallow the fsync error, but the system does not handle the error nicely: the test case program hangs on fsync, the load jumps up and there's a bunch of z_wr_iss and z_null_int kernel threads belonging to zfs, eating up the CPU. Even then I managed to reboot the system, so it's not a complete and utter mess. The test case adjustments are here: <a href="https://github.com/zejn/scrapcode/commit/e7612536c346d59a4b69bedfbcafbe8c1079063c">https://github.com/zejn/scrapcode/commit/e7612536c346d59a4b69bedfbcafbe8c1079063c</a> Kind regards, <hr /> On 29. 03. 2018 07:25, Craig Ringer wrote: <blockquote> On 29 March 2018 at 13:06, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com <pre><code>On Thu, Mar 29, 2018 at 6:00 PM, Justin Pryzby > The retries are the source of the problem ; the first fsync() can return EIO, > and also *clears the error* causing a 2nd fsync (of the same data) to return > success. > What I'm failing to grok here is how that error flag even matters, > whether it's a single bit or a counter as described in that patch. If > write back failed, *the page is still dirty*. So all future calls to > fsync() need to try to try to flush it again, and (presumably) fail > again (unless it happens to succeed this time around). </code></pre> You'd think so. But it doesn't appear to work that way. You can see yourself with the error device-mapper destination mapped over part of a volume. I wrote a test case here. <a href="https://github.com/ringerc/scrapcode/blob/master/testcases/fsync-error-clear.c">https://github.com/ringerc/scrapcode/blob/master/testcases/fsync-error-clear.c</a> I don't pretend the kernel behaviour is sane. And it's possible I've made an error in my analysis. But since I've observed this in the wild, and seen it in a test case, I strongly suspect that's what I've described is just what's happening, brain-dead or no. Presumably the kernel marks the page clean when it dispatches it to the I/O subsystem and doesn't dirty it again on I/O error? I haven't dug that deep on the kernel side. See the stackoverflow post for details on what I found in kernel code analysis. </blockquote> <hr /> <pre><code>From:Andres Freund <andres(at)anarazel(dot)de> Date:2018-04-23 20:14:48 </code></pre> Hi, On 2018-03-28 10:23:46 +0800, Craig Ringer wrote: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at least on Linux. When fsync() returns success it means "all writes since the last fsync have hit disk" but we assume it means "all writes since the last SUCCESSFUL fsync have hit disk". But then we retried the checkpoint, which retried the fsync(). The retry succeeded, because the prior fsync() cleared the AS_EIO bad page flag. </blockquote> Random other thing we should look at: Some filesystems (nfs yes, xfs ext4 no) flush writes at close(2). We check close() return code, just log it... So close() counts as an fsync for such filesystems(). I'm LSF/MM to discuss future behaviour of linux here, but that's how it is right now. <hr /> <pre><code>From:Bruce Momjian <bruce(at)momjian(dot)us> Date:2018-04-24 00:09:23 </code></pre> On Mon, Apr 23, 2018 at 01:14:48PM -0700, Andres Freund wrote: <blockquote> Hi, On 2018-03-28 10:23:46 +0800, Craig Ringer wrote: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at least on Linux. When fsync() returns success it means "all writes since the last fsync have hit disk" but we assume it means "all writes since the last SUCCESSFUL fsync have hit disk". But then we retried the checkpoint, which retried the fsync(). The retry succeeded, because the prior fsync() cleared the AS_EIO bad page flag. </blockquote> Random other thing we should look at: Some filesystems (nfs yes, xfs ext4 no) flush writes at close(2). We check close() return code, just log it... So close() counts as an fsync for such filesystems(). </blockquote> Well, that's interesting. You might remember that NFS does not reserve space for writes like local file systems like ext4/xfs do. For that reason, we might be able to capture the out-of-space error on close and exit sooner for NFS. <hr /> <pre><code>From:Craig Ringer <craig(at)2ndquadrant(dot)com> Date:2018-04-26 02:16:52 </code></pre> On 24 April 2018 at 04:14, Andres Freund wrote: <blockquote> I'm LSF/MM to discuss future behaviour of linux here, but that's how it is right now. </blockquote> Interim LWN.net coverage of that can be found here: <a href="https://lwn.net/Articles/752613/">https://lwn.net/Articles/752613/</a> <hr /> <pre><code>From:Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> Date:2018-04-27 01:18:55 </code></pre> On Tue, Apr 24, 2018 at 12:09 PM, Bruce Momjian wrote: <blockquote> On Mon, Apr 23, 2018 at 01:14:48PM -0700, Andres Freund wrote: <blockquote> Hi, On 2018-03-28 10:23:46 +0800, Craig Ringer wrote: <blockquote> TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at least on Linux. When fsync() returns success it means "all writes since the last fsync have hit disk" but we assume it means "all writes since the last SUCCESSFUL fsync have hit disk". But then we retried the checkpoint, which retried the fsync(). The retry succeeded, because the prior fsync() cleared the AS_EIO bad page flag. </blockquote> Random other thing we should look at: Some filesystems (nfs yes, xfs ext4 no) flush writes at close(2). We check close() return code, just log it... So close() counts as an fsync for such filesystems(). </blockquote> Well, that's interesting. You might remember that NFS does not reserve space for writes like local file systems like ext4/xfs do. For that reason, we might be able to capture the out-of-space error on close and exit sooner for NFS. </blockquote> It seems like some implementations flush on close and therefore discover ENOSPC problem at that point, unless they have NVSv4 (RFC 3050) "write delegation" with a promise from the server that a certain amount of space is available. It seems like you can't count on that in any way though, because it's the server that decides when to delegate and how much space to promise is preallocated, not the client. So in userspace you always need to be able to handle errors including ENOSPC returned by close(), and if you ignore that and you're using an operating system that immediately incinerates all evidence after telling you that (so that later fsync() doesn't fail), you're in trouble. Some relevant code: <ul> <li><a href="https://github.com/torvalds/linux/commit/5445b1fbd123420bffed5e629a420aa2a16bf849">https://github.com/torvalds/linux/commit/5445b1fbd123420bffed5e629a420aa2a16bf849</a></li> <li><a href="https://github.com/freebsd/freebsd/blob/master/sys/fs/nfsclient/nfs_clvnops.c#L618">https://github.com/freebsd/freebsd/blob/master/sys/fs/nfsclient/nfs_clvnops.c#L618</a></li> </ul> It looks like the bleeding edge of the NFS spec includes a new ALLOCATE operation that should be able to support posix_fallocate() (if we were to start using that for extending files): <a href="https://tools.ietf.org/html/rfc7862#page-64">https://tools.ietf.org/html/rfc7862#page-64</a> I'm not sure how reliable [posix_]fallocate is on NFS in general though, and it seems that there are fall-back implementations of posix_fallocate() that write zeros (or even just feign success?) which probably won't do anything useful here if not also flushed (that fallback strategy might only work on eager reservation filesystems that don't have direct fallocate support?) so there are several layers (libc, kernel, nfs client, nfs server) that'd need to be aligned for that to work, and it's not clear how a humble userspace program is supposed to know if they are. I guess if you could find a way to amortise the cost of extending (like Oracle et al do by extending big container datafiles 10MB at a time or whatever), then simply writing zeros and flushing when doing that might work out OK, so you wouldn't need such a thing? (Unless of course it's a COW filesystem, but that's a different can of worms.) <hr /> This thread continues on the <code>ext4</code> mailing list: <hr /> <pre><code>From: "Joshua D. Drake" <jd@...mandprompt.com> Subject: fsync() errors is unsafe and risks data loss Date: Tue, 10 Apr 2018 09:28:15 -0700 </code></pre> -ext4, If this is not the appropriate list please point me in the right direction. I am a PostgreSQL contributor and we have come across a reliability problem with writes and fsync(). You can see the thread here: <a href="https://www.postgresql.org/message-id/flat/20180401002038.GA2211%40paquier.xyz#20180401002038.GA2211@paquier.xyz">https://www.postgresql.org/message-id/flat/20180401002038.GA2211%40paquier.xyz#20180401002038.GA2211@paquier.xyz</a> The tl;dr; in the first message doesn't quite describe the problem as we started to dig into it further. <hr /> <pre><code>From: "Darrick J. Wong" <darrick.wong@...cle.com> Date: Tue, 10 Apr 2018 09:54:43 -0700 </code></pre> On Tue, Apr 10, 2018 at 09:28:15AM -0700, Joshua D. Drake wrote: <blockquote> -ext4, If this is not the appropriate list please point me in the right direction. I am a PostgreSQL contributor and we have come across a reliability problem with writes and fsync(). You can see the thread here: <a href="https://www.postgresql.org/message-id/flat/20180401002038.GA2211%40paquier.xyz#20180401002038.GA2211@paquier.xyz">https://www.postgresql.org/message-id/flat/20180401002038.GA2211%40paquier.xyz#20180401002038.GA2211@paquier.xyz</a> The tl;dr; in the first message doesn't quite describe the problem as we started to dig into it further. </blockquote> You might try the XFS list (linux-xfs@...r.kernel.org) seeing as the initial complaint is against xfs behaviors... <hr /> <pre><code>From: "Joshua D. Drake" <jd@...mandprompt.com> Date: Tue, 10 Apr 2018 09:58:21 -0700 </code></pre> On 04/10/2018 09:54 AM, Darrick J. Wong wrote: <blockquote> On Tue, Apr 10, 2018 at 09:28:15AM -0700, Joshua D. Drake wrote: <blockquote> -ext4, If this is not the appropriate list please point me in the right direction. I am a PostgreSQL contributor and we have come across a reliability problem with writes and fsync(). You can see the thread here: <a href="https://www.postgresql.org/message-id/flat/20180401002038.GA2211%40paquier.xyz#20180401002038.GA2211@paquier.xyz">https://www.postgresql.org/message-id/flat/20180401002038.GA2211%40paquier.xyz#20180401002038.GA2211@paquier.xyz</a> The tl;dr; in the first message doesn't quite describe the problem as we started to dig into it further. </blockquote> You might try the XFS list (linux-xfs@...r.kernel.org) seeing as the initial complaint is against xfs behaviors... </blockquote> Later in the thread it becomes apparent that it applies to ext4 (NFS too) as well. I picked ext4 because I assumed it is the most populated of the lists since its the default filesystem for most distributions. <hr /> <pre><code>From: "Theodore Y. Ts'o" <tytso@....edu> Date: Tue, 10 Apr 2018 14:43:56 -0400 </code></pre> Hi Joshua, This isn't actually an ext4 issue, but a long-standing VFS/MM issue. There are going to be multiple opinions about what the right thing to do. I'll try to give as unbiased a description as possible, but certainly some of this is going to be filtered by my own biases no matter how careful I can be. First of all, what storage devices will do when they hit an exception condition is quite non-deterministic. For example, the vast majority of SSD's are not power fail certified. What this means is that if they suffer a power drop while they are doing a GC, it is quite possible for data written six months ago to be lost as a result. The LBA could potentialy be far, far away from any LBA's that were recently written, and there could have been multiple CACHE FLUSH operations in the since the LBA in question was last written six months ago. No matter; for a consumer-grade SSD, it's possible for that LBA to be trashed after an unexpected power drop. Which is why after a while, one can get quite paranoid and assume that the only way you can guarantee data robustness is to store multiple copies and/or use erasure encoding, with some of the copies or shards written to geographically diverse data centers. Secondly, I think it's fair to say that the vast majority of the companies who require data robustness, and are either willing to pay $$$ to an enterprise distro company like Red Hat, or command a large enough paying customer base that they can afford to dictate terms to an enterprise distro, or hire a consultant such as Christoph, or have their own staffed Linux kernel teams, have tended to use O_DIRECT. So for better or for worse, there has not been as much investment in buffered I/O and data robustness in the face of exception handling of storage devices. Next, the reason why fsync() has the behaviour that it does is one ofhe the most common cases of I/O storage errors in buffered use cases, certainly as seen by the community distros, is the user who pulls out USB stick while it is in use. In that case, if there are dirtied pages in the page cache, the question is what can you do? Sooner or later the writes will time out, and if you leave the pages dirty, then it effectively becomes a permanent memory leak. You can't unmount the file system --- that requires writing out all of the pages such that the dirty bit is turned off. And if you don't clear the dirty bit on an I/O error, then they can never be cleaned. You can't even re-insert the USB stick; the re-inserted USB stick will get a new block device. Worse, when the USB stick was pulled, it will have suffered a power drop, and see above about what could happen after a power drop for non-power fail certified flash devices --- it goes double for the cheap sh*t USB sticks found in the checkout aisle of Micro Center. So this is the explanation for why Linux handles I/O errors by clearing the dirty bit after reporting the error up to user space. And why there is not eagerness to solve the problem simply by "don't clear the dirty bit". For every one Postgres installation that might have a better recover after an I/O error, there's probably a thousand clueless Fedora and Ubuntu users who will have a much worse user experience after a USB stick pull happens. I can think of things that could be done --- for example, it could be switchable on a per-block device basis (or maybe a per-mount basis) whether or not the dirty bit gets cleared after the error is reported to userspace. And perhaps there could be a new unmount flag that causes all dirty pages to be wiped out, which could be used to recover after a permanent loss of the block device. But the question is who is going to invest the time to make these changes? If there is a company who is willing to pay to comission this work, it's almost certainly soluble. Or if a company which has a kernel on staff is willing to direct an engineer to work on it, it certainly could be solved. But again, of the companies who have client code where we care about robustness and proper handling of failed disk drives, and which have a kernel team on staff, pretty much all of the ones I can think of (e.g., Oracle, Google, etc.) use O_DIRECT and they don't try to make buffered writes and error reporting via fsync(2) work well. In general these companies want low-level control over buffer cache eviction algorithms, which drives them towards the design decision of effectively implementing the page cache in userspace, and using O_DIRECT reads/writes. If you are aware of a company who is willing to pay to have a new kernel feature implemented to meet your needs, we might be able to refer you to a company or a consultant who might be able to do that work. Let me know off-line if that's the case... <hr /> <pre><code>From: Andreas Dilger <adilger@...ger.ca> Date: Tue, 10 Apr 2018 13:44:48 -0600 </code></pre> On Apr 10, 2018, at 10:50 AM, Joshua D. Drake <a href="mailto:jd@...mandprompt.com">jd@...mandprompt.com</a> wrote: <blockquote> -ext4, If this is not the appropriate list please point me in the right direction. I am a PostgreSQL contributor and we have come across a reliability problem with writes and fsync(). You can see the thread here: <a href="https://www.postgresql.org/message-id/flat/20180401002038.GA2211%40paquier.xyz#20180401002038.GA2211@paquier.xyz">https://www.postgresql.org/message-id/flat/20180401002038.GA2211%40paquier.xyz#20180401002038.GA2211@paquier.xyz</a> The tl;dr; in the first message doesn't quite describe the problem as we started to dig into it further. </blockquote> Yes, this is a very long thread. The summary is Postgres is unhappy that fsync() on Linux (and also other OSes) returns an error once if there was a prior write() failure, instead of keeping dirty pages in memory forever and trying to rewrite them. This behaviour has existed on Linux forever, and (for better or worse) is the only reasonable behaviour that the kernel can take. I've argued for the opposite behaviour at times, and some subsystems already do limited retries before finally giving up on a failed write, though there are also times when retrying at lower levels is pointless if a higher level of code can handle the failure (e.g. mirrored block devices, filesystem data mirroring, userspace data mirroring, or cross-node replication). The confusion is whether fsync() is a "level" state (return error forever if there were pages that could not be written), or an "edge" state (return error only for any write failures since the previous fsync() call). I think Anthony Iliopoulos was pretty clear in his multiple descriptions in that thread of why the current behaviour is needed (OOM of the whole system if dirty pages are kept around forever), but many others were stuck on "I can't believe this is happening??? This is totally unacceptable and every kernel needs to change to match my expectations!!!" without looking at the larger picture of what is practical to change and where the issue should best be fixed. Regardless of why this is the case, the net is that PG needs to deal with all of the systems that currently exist that have this behaviour, even if some day in the future it may change (though that is unlikely). It seems ironic that "keep dirty pages in userspace until fsync() returns success" is totally unacceptable, but "keep dirty pages in the kernel" is fine. My (limited) understanding of databases was that they preferred to cache everything in userspace and use O_DIRECT to write to disk (which returns an error immediately if the write fails and does not double buffer data). <hr /> From: Martin Steigerwald <a href="mailto:martin@...htvoll.de">martin@...htvoll.de</a> Date: Tue, 10 Apr 2018 21:47:21 +0200 Hi Theodore, Darrick, Joshua. CC´d fsdevel as it does not appear to be Ext4 specific to me (and to you as well, Theodore). Theodore Y. Ts'o - 10.04.18, 20:43: <blockquote> This isn't actually an ext4 issue, but a long-standing VFS/MM issue. […] First of all, what storage devices will do when they hit an exception condition is quite non-deterministic. For example, the vast majority of SSD's are not power fail certified. What this means is that if they suffer a power drop while they are doing a GC, it is quite possible for data written six months ago to be lost as a result. The LBA could potentialy be far, far away from any LBA's that were recently written, and there could have been multiple CACHE FLUSH operations in the since the LBA in question was last written six months ago. No matter; for a consumer-grade SSD, it's possible for that LBA to be trashed after an unexpected power drop. </blockquote> Guh. I was not aware of this. I knew consumer-grade SSDs often do not have power loss protection, but still thought they´d handle garble collection in an atomic way. Sometimes I am tempted to sing an "all hardware is crap" song (starting with Meltdown/Spectre, then probably heading over to storage devices and so on… including firmware crap like Intel ME). <blockquote> Next, the reason why fsync() has the behaviour that it does is one ofhe the most common cases of I/O storage errors in buffered use cases, certainly as seen by the community distros, is the user who pulls out USB stick while it is in use. In that case, if there are dirtied pages in the page cache, the question is what can you do? Sooner or later the writes will time out, and if you leave the pages dirty, then it effectively becomes a permanent memory leak. You can't unmount the file system --- that requires writing out all of the pages such that the dirty bit is turned off. And if you don't clear the dirty bit on an I/O error, then they can never be cleaned. You can't even re-insert the USB stick; the re-inserted USB stick will get a new block device. Worse, when the USB stick was pulled, it will have suffered a power drop, and see above about what could happen after a power drop for non-power fail certified flash devices --- it goes double for the cheap sh*t USB sticks found in the checkout aisle of Micro Center. From the original PostgreSQL mailing list thread I did not get on how exactly FreeBSD differs in behavior, compared to Linux. I am aware of one operating system that from a user point of view handles this in almost the right way IMHO: AmigaOS. </blockquote> When you removed a floppy disk from the drive while the OS was writing to it it showed a "You MUST insert volume somename into drive somedrive:" and if you did, it just continued writing. (The part that did not work well was that with the original filesystem if you did not insert it back, the whole disk was corrupted, usually to the point beyond repair, so the "MUST" was no joke.) In my opinion from a user´s point of view this is the only sane way to handle the premature removal of removable media. I have read of a GSoC project to implement something like this for NetBSD but I did not check on the outcome of it. But in MS-DOS I think there has been something similar, however MS-DOS is not an multitasking operating system as AmigaOS is. Implementing something like this for Linux would be quite a feat, I think, cause in addition to the implementation in the kernel, the desktop environment or whatever other userspace you use would need to handle it as well, so you´d have to adapt udev / udisks / probably Systemd. And probably this behavior needs to be restricted to anything that is really removable and even then in order to prevent memory exhaustion in case processes continue to write to an removed and not yet re-inserted USB harddisk the kernel would need to halt I/O processes which dirty I/O to this device. (I believe this is what AmigaOS did. It just blocked all subsequent I/O to the device still it was re-inserted. But then the I/O handling in that OS at that time is quite different from what Linux does.) <blockquote> So this is the explanation for why Linux handles I/O errors by clearing the dirty bit after reporting the error up to user space. And why there is not eagerness to solve the problem simply by "don't clear the dirty bit". For every one Postgres installation that might have a better recover after an I/O error, there's probably a thousand clueless Fedora and Ubuntu users who will have a much worse user experience after a USB stick pull happens. </blockquote> I was not aware that flash based media may be as crappy as you hint at. <blockquote> From my tests with AmigaOS 4.something or AmigaOS 3.9 + 3rd Party Poseidon USB stack the above mechanism worked even with USB sticks. I however did not test this often and I did not check for data corruption after a test. </blockquote> <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Tue, 10 Apr 2018 15:07:26 -0700 </code></pre> (Sorry if I screwed up the thread structure - I'd to reconstruct the reply-to and CC list from web archive as I've not found a way to properly download an mbox or such of old content. Was subscribed to fsdevel but not ext4 lists) Hi, 2018-04-10 18:43:56 Ted wrote: <blockquote> I'll try to give as unbiased a description as possible, but certainly some of this is going to be filtered by my own biases no matter how careful I can be. </blockquote> Same ;) 2018-04-10 18:43:56 Ted wrote: <blockquote> So for better or for worse, there has not been as much investment in buffered I/O and data robustness in the face of exception handling of storage devices. </blockquote> That's a bit of a cop out. It's not just databases that care. Even more basic tools like SCM, package managers and editors care whether they can proper responses back from fsync that imply things actually were synced. 2018-04-10 18:43:56 Ted wrote: <blockquote> So this is the explanation for why Linux handles I/O errors by clearing the dirty bit after reporting the error up to user space. And why there is not eagerness to solve the problem simply by "don't clear the dirty bit". For every one Postgres installation that might have a better recover after an I/O error, there's probably a thousand clueless Fedora and Ubuntu users who will have a much worse user experience after a USB stick pull happens. </blockquote> I don't think these necessarily are as contradictory goals as you paint them. At least in postgres' case we can deal with the fact that an fsync retry isn't going to fix the problem by reentering crash recovery or just shutting down - therefore we don't need to keep all the dirty buffers around. A per-inode or per-superblock bit that causes further fsyncs to fail would be entirely sufficent for that. While there's some differing opinions on the referenced postgres thread, the fundamental problem isn't so much that a retry won't fix the problem, it's that we might NEVER see the failure. If writeback happens in the background, encounters an error, undirties the buffer, we will happily carry on because we've never seen that. That's when we're majorly screwed. Both in postgres, and a lot of other applications, it's not at all guaranteed to consistently have one FD open for every file writtten. Therefore even the more recent per-fd errseq logic doesn't guarantee that the failure will ever be seen by an application diligently fsync()ing. You'd not even need to have per inode information or such in the case that the block device goes away entirely. As the FS isn't generally unmounted in that case, you could trivially keep a per-mount (or superblock?) bit that says "I died" and set that instead of keeping per inode/whatever information. 2018-04-10 18:43:56 Ted wrote: <blockquote> If you are aware of a company who is willing to pay to have a new kernel feature implemented to meet your needs, we might be able to refer you to a company or a consultant who might be able to do that work. </blockquote> I find it a bit dissapointing response. I think it's fair to say that for advanced features, but we're talking about the basic guarantee that fsync actually does something even remotely reasonable. 2018-04-10 19:44:48 Andreas wrote: <blockquote> The confusion is whether fsync() is a "level" state (return error forever if there were pages that could not be written), or an "edge" state (return error only for any write failures since the previous fsync() call). </blockquote> I don't think that's the full issue. We can deal with the fact that an fsync failure is edge-triggered if there's a guarantee that every process doing so would get it. The fact that one needs to have an FD open from before any failing writes occurred to get a failure, THAT'S the big issue. Beyond postgres, it's a pretty common approach to do work on a lot of files without fsyncing, then iterate over the directory fsync everything, and then assume you're safe. But unless I severaly misunderstand something that'd only be safe if you kept an FD for every file open, which isn't realistic for pretty obvious reasons. 2018-04-10 18:43:56 Ted wrote: <blockquote> I think Anthony Iliopoulos was pretty clear in his multiple descriptions in that thread of why the current behaviour is needed (OOM of the whole system if dirty pages are kept around forever), but many others were stuck on "I can't believe this is happening??? This is totally unacceptable and every kernel needs to change to match my expectations!!!" without looking at the larger picture of what is practical to change and where the issue should best be fixed. </blockquote> Everone can participate in discussions... <hr /> <pre><code>From: Andreas Dilger <adilger@...ger.ca> Date: Wed, 11 Apr 2018 15:52:44 -0600 </code></pre> On Apr 10, 2018, at 4:07 PM, Andres Freund <a href="mailto:andres@...razel.de">andres@...razel.de</a> wrote: <blockquote> 2018-04-10 18:43:56 Ted wrote: <blockquote> So for better or for worse, there has not been as much investment in buffered I/O and data robustness in the face of exception handling of storage devices. </blockquote> That's a bit of a cop out. It's not just databases that care. Even more basic tools like SCM, package managers and editors care whether they can proper responses back from fsync that imply things actually were synced. </blockquote> Sure, but it is mostly PG that is doing (IMHO) crazy things like writing to thousands(?) of files, closing the file descriptors, then expecting fsync() on a newly-opened fd to return a historical error. If an editor tries to write a file, then calls fsync and gets an error, the user will enter a new pathname and retry the write. The package manager will assume the package installation failed, and uninstall the parts of the package that were already written. There is no way the filesystem can handle the package manager failure case, and keeping the pages dirty and retrying indefinitely may never work (e.g. disk is dead or disconnected, is a sparse volume without any free space, etc). This (IMHO) implies that the higher layer (which knows more about what the write failure implies) needs to deal with this. <blockquote> 2018-04-10 18:43:56 Ted wrote: <blockquote> So this is the explanation for why Linux handles I/O errors by clearing the dirty bit after reporting the error up to user space. And why there is not eagerness to solve the problem simply by "don't clear the dirty bit". For every one Postgres installation that might have a better recover after an I/O error, there's probably a thousand clueless Fedora and Ubuntu users who will have a much worse user experience after a USB stick pull happens. </blockquote> I don't think these necessarily are as contradictory goals as you paint them. At least in postgres' case we can deal with the fact that an fsync retry isn't going to fix the problem by reentering crash recovery or just shutting down - therefore we don't need to keep all the dirty buffers around. A per-inode or per-superblock bit that causes further fsyncs to fail would be entirely sufficent for that. While there's some differing opinions on the referenced postgres thread, the fundamental problem isn't so much that a retry won't fix the problem, it's that we might NEVER see the failure. If writeback happens in the background, encounters an error, undirties the buffer, we will happily carry on because we've never seen that. That's when we're majorly screwed. </blockquote> I think there are two issues here - "fsync() on an fd that was just opened" and "persistent error state (without keeping dirty pages in memory)". If there is background data writeback without an open file descriptor, there is no mechanism for the kernel to return an error to any application which may exist, or may not ever come back. Consider if there was a per-inode "there was once an error writing this inode" flag. Then fsync() would return an error on the inode forever, since there is no way in POSIX to clear this state, since it would need to be kept in case some new fd is opened on the inode and does an fsync() and wants the error to be returned. IMHO, the only alternative would be to keep the dirty pages in memory until they are written to disk. If that was not possible, what then? It would need a reboot to clear the dirty pages, or truncate the file (discarding all data)? <blockquote> Both in postgres, and a lot of other applications, it's not at all guaranteed to consistently have one FD open for every file written. Therefore even the more recent per-fd errseq logic doesn't guarantee that the failure will ever be seen by an application diligently fsync()ing. </blockquote> ... only if the application closes all fds for the file before calling fsync. If any fd is kept open from the time of the failure, it will return the original error on fsync() (and then no longer return it). It's not that you need to keep every fd open forever. You could put them into a shared pool, and re-use them if the file is "re-opened", and call fsync on each fd before it is closed (because the pool is getting too big or because you want to flush the data for that file, or shut down the DB). That wouldn't require a huge re-architecture of PG, just a small library to handle the shared fd pool. That might even improve performance, because opening and closing files is itself not free, especially if you are working with remote filesystems. <blockquote> You'd not even need to have per inode information or such in the case that the block device goes away entirely. As the FS isn't generally unmounted in that case, you could trivially keep a per-mount (or superblock?) bit that says "I died" and set that instead of keeping per inode/whatever information. </blockquote> The filesystem will definitely return an error in this case, I don't think this needs any kind of changes: int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) { if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; <blockquote> 2018-04-10 18:43:56 Ted wrote: <blockquote> If you are aware of a company who is willing to pay to have a new kernel feature implemented to meet your needs, we might be able to refer you to a company or a consultant who might be able to do that work. </blockquote> I find it a bit dissapointing response. I think it's fair to say that for advanced features, but we're talking about the basic guarantee that fsync actually does something even remotely reasonable. </blockquote> Linux (as PG) is run by people who develop it for their own needs, or are paid to develop it for the needs of others. Everyone already has too much work to do, so you need to find someone who has an interest in fixing this (IMHO very peculiar) use case. If PG developers want to add a tunable "keep dirty pages in RAM on IO failure", I don't think that it would be too hard for someone to do. It might be harder to convince some of the kernel maintainers to accept it, and I've been on the losing side of that battle more than once. However, like everything you don't pay for, you can't require someone else to do this for you. It wouldn't hurt to see if Jeff Layton, who wrote the errseq patches, would be interested to work on something like this. That said, even if a fix was available for Linux tomorrow, it would be years before a majority of users would have it available on their system, that includes even the errseq mechanism that was landed a few months ago. That implies to me that you'd want something that fixes PG now so that it works around whatever (perceived) breakage exists in the Linux fsync() implementation. Since the thread indicates that non-Linux kernels have the same fsync() behaviour, it makes sense to do that even if the Linux fix was available. <blockquote> 2018-04-10 19:44:48 Andreas wrote: <blockquote> The confusion is whether fsync() is a "level" state (return error forever if there were pages that could not be written), or an "edge" state (return error only for any write failures since the previous fsync() call). </blockquote> I don't think that's the full issue. We can deal with the fact that an fsync failure is edge-triggered if there's a guarantee that every process doing so would get it. The fact that one needs to have an FD open from before any failing writes occurred to get a failure, THAT'S the big issue. Beyond postgres, it's a pretty common approach to do work on a lot of files without fsyncing, then iterate over the directory fsync everything, and then assume you're safe. But unless I severaly misunderstand something that'd only be safe if you kept an FD for every file open, which isn't realistic for pretty obvious reasons. </blockquote> I can't say how common or uncommon such a workload is, though PG is the only application that I've heard of doing it, and I've been working on filesystems for 20 years. I'm a bit surprised that anyone expects fsync() on a newly-opened fd to have any state from write() calls that predate the open. I can understand fsync() returning an error for any IO that happens within the context of that fsync(), but how far should it go back for reporting errors on that file? Forever? The only way to clear the error would be to reboot the system, since I'm not aware of any existing POSIX code to clear such an error <hr /> <pre><code>From: Dave Chinner <david@...morbit.com> Date: Thu, 12 Apr 2018 10:09:16 +1000 </code></pre> On Wed, Apr 11, 2018 at 03:52:44PM -0600, Andreas Dilger wrote: > On Apr 10, 2018, at 4:07 PM, Andres Freund <a href="mailto:andres@...razel.de">andres@...razel.de</a> wrote: > > 2018-04-10 18:43:56 Ted wrote: > >> So for better or for worse, there has not been as much investment in > >> buffered I/O and data robustness in the face of exception handling of > >> storage devices. > > > > That's a bit of a cop out. It's not just databases that care. Even more > > basic tools like SCM, package managers and editors care whether they can > > proper responses back from fsync that imply things actually were synced. > > Sure, but it is mostly PG that is doing (IMHO) crazy things like writing > to thousands(?) of files, closing the file descriptors, then expecting > fsync() on a newly-opened fd to return a historical error. Yeah, this seems like a recipe for disaster, especially on cross-platform code where every OS platform behaves differently and almost never to expectation. And speaking of "behaving differently to expectations", nobody has mentioned that close() can also return write errors. Hence if you do write - close - open - fsync the the write error might get reported on close, not fsync. IOWs, the assumption that "async writeback errors will persist across close to open" is fundamentally broken to begin with. It's even documented as a slient data loss vector in the close(2) man page: <pre><code>$ man 2 close ..... Dealing with error returns from close() A careful programmer will check the return value of close(), since it is quite possible that errors on a previous write(2) operation are reported only on the final close() that releases the open file description. Failing to check the return value when closing a file may lead to silent loss of data. This can especially be observed with NFS and with disk quota. </code></pre> Yeah, ensuring data integrity in the face of IO errors is a really hard problem. :/ To pound the broken record: there are many good reasons why Linux filesystem developers have said "you should use direct IO" to the PG devs each time we have this "the kernel doesn't do [complex things PG needs]" discussion. In this case, robust IO error reporting is easy with DIO. It's one of the reasons most of the high performance database engines are either using or moving to non-blocking AIO+DIO (RWF_NOWAIT) and use O_DSYNC/RWF_DSYNC for integrity-critical IO dispatch. This is also being driven by the availability of high performance, high IOPS solid state storage where buffering in RAM to optimise IO patterns and throughput provides no real performance benefit. Using the AIO+DIO infrastructure ensures errors are reported for the specific write that fails at failure time (i.e. in the aio completion event for the specific IO), yet high IO throughput can be maintained without the application needing it's own threading infrastructure to prevent blocking. This means the application doesn't have to guess where the write error occurred to retry/recover, have to handle async write errors on close(), have to use fsync() to gather write IO errors and then infer where the IO failure was, or require kernels on every supported platform to jump through hoops to try to do exactly the right thing in error conditions for everyone in all circumstances at all times.... <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Wed, 11 Apr 2018 19:17:52 -0700 </code></pre> On 2018-04-11 15:52:44 -0600, Andreas Dilger wrote: <blockquote> On Apr 10, 2018, at 4:07 PM, Andres Freund <a href="mailto:andres@...razel.de">andres@...razel.de</a> wrote: <blockquote> 2018-04-10 18:43:56 Ted wrote: <blockquote> So for better or for worse, there has not been as much investment in buffered I/O and data robustness in the face of exception handling of storage devices. </blockquote> That's a bit of a cop out. It's not just databases that care. Even more basic tools like SCM, package managers and editors care whether they can proper responses back from fsync that imply things actually were synced. </blockquote> Sure, but it is mostly PG that is doing (IMHO) crazy things like writing to thousands(?) of files, closing the file descriptors, then expecting fsync() on a newly-opened fd to return a historical error. </blockquote> It's not just postgres. dpkg (underlying apt, on debian derived distros) to take an example I just randomly guessed, does too: <pre><code> /* We want to guarantee the extracted files are on the disk, so that the * subsequent renames to the info database do not end up with old or zero * length files in case of a system crash. As neither dpkg-deb nor tar do * explicit fsync()s, we have to do them here. * XXX: This could be avoided by switching to an internal tar extractor. */ dir_sync_contents(cidir); </code></pre> (a bunch of other places too) Especially on ext3 but also on newer filesystems it's performancewise entirely infeasible to fsync() every single file individually - the performance becomes entirely attrocious if you do that. I think there's some legitimate arguments that a database should use direct IO (more on that as a reply to David), but claiming that all sorts of random utilities need to use DIO with buffering etc is just insane. <blockquote> If an editor tries to write a file, then calls fsync and gets an error, the user will enter a new pathname and retry the write. The package manager will assume the package installation failed, and uninstall the parts of the package that were already written. </blockquote> Except that they won't notice that they got a failure, at least in the dpkg case. And happily continue installing corrupted data <blockquote> There is no way the filesystem can handle the package manager failure case, and keeping the pages dirty and retrying indefinitely may never work (e.g. disk is dead or disconnected, is a sparse volume without any free space, etc). This (IMHO) implies that the higher layer (which knows more about what the write failure implies) needs to deal with this. </blockquote> Yea, I agree that'd not be sane. As far as I understand the dpkg code (all of 10min reading it), that'd also be unnecessary. It can abort the installation, but only if it detects the error. Which isn't happening. <blockquote> <blockquote> While there's some differing opinions on the referenced postgres thread, the fundamental problem isn't so much that a retry won't fix the problem, it's that we might NEVER see the failure. If writeback happens in the background, encounters an error, undirties the buffer, we will happily carry on because we've never seen that. That's when we're majorly screwed. </blockquote> I think there are two issues here - "fsync() on an fd that was just opened" and "persistent error state (without keeping dirty pages in memory)". If there is background data writeback without an open file descriptor, there is no mechanism for the kernel to return an error to any application which may exist, or may not ever come back. </blockquote> And that's horrible. If I cp a file, and writeback fails in the background, and I then cat that file before restarting, I should be able to see that that failed. Instead of returning something bogus. Or even more extreme, you untar/zip/git clone a directory. Then do a sync. And you don't know whether anything actually succeeded. <blockquote> Consider if there was a per-inode "there was once an error writing this inode" flag. Then fsync() would return an error on the inode forever, since there is no way in POSIX to clear this state, since it would need to be kept in case some new fd is opened on the inode and does an fsync() and wants the error to be returned. </blockquote> The data in the file also is corrupt. Having to unmount or delete the file to reset the fact that it can't safely be assumed to be on disk isn't insane. <blockquote> <blockquote> Both in postgres, and a lot of other applications, it's not at all guaranteed to consistently have one FD open for every file written. Therefore even the more recent per-fd errseq logic doesn't guarantee that the failure will ever be seen by an application diligently fsync()ing. </blockquote> ... only if the application closes all fds for the file before calling fsync. If any fd is kept open from the time of the failure, it will return the original error on fsync() (and then no longer return it). It's not that you need to keep every fd open forever. You could put them into a shared pool, and re-use them if the file is "re-opened", and call fsync on each fd before it is closed (because the pool is getting too big or because you want to flush the data for that file, or shut down the DB). That wouldn't require a huge re-architecture of PG, just a small library to handle the shared fd pool. </blockquote> Except that postgres uses multiple processes. And works on a lot of architectures. If we started to fsync all opened files on process exit our users would lynch us. We'd need a complicated scheme that sends processes across sockets between processes, then deduplicate them on the receiving side, somehow figuring out which is the oldest filedescriptors (handling clockdrift safely). Note that it'd be perfectly fine that we've "thrown away" the buffer contents if we'd get notified that the fsync failed. We could just do WAL replay, and restore the contents (just was we do after crashes and/or for replication). <blockquote> That might even improve performance, because opening and closing files is itself not free, especially if you are working with remote filesystems. </blockquote> There's already a per-process cache of open files. <blockquote> <blockquote> You'd not even need to have per inode information or such in the case that the block device goes away entirely. As the FS isn't generally unmounted in that case, you could trivially keep a per-mount (or superblock?) bit that says "I died" and set that instead of keeping per inode/whatever information. </blockquote> The filesystem will definitely return an error in this case, I don't think this needs any kind of changes: int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) { if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; </blockquote> Well, I'm making that argument because several people argued that throwing away buffer contents in this case is the only way to not cause OOMs, and that that's incompatible with reporting errors. It's clearly not... <blockquote> <blockquote> 2018-04-10 18:43:56 Ted wrote: <blockquote> If you are aware of a company who is willing to pay to have a new kernel feature implemented to meet your needs, we might be able to refer you to a company or a consultant who might be able to do that work. </blockquote> I find it a bit dissapointing response. I think it's fair to say that for advanced features, but we're talking about the basic guarantee that fsync actually does something even remotely reasonable. </blockquote> Linux (as PG) is run by people who develop it for their own needs, or are paid to develop it for the needs of others. </blockquote> Sure. <blockquote> Everyone already has too much work to do, so you need to find someone who has an interest in fixing this (IMHO very peculiar) use case. If PG developers want to add a tunable "keep dirty pages in RAM on IO failure", I don't think that it would be too hard for someone to do. It might be harder to convince some of the kernel maintainers to accept it, and I've been on the losing side of that battle more than once. However, like everything you don't pay for, you can't require someone else to do this for you. It wouldn't hurt to see if Jeff Layton, who wrote the errseq patches, would be interested to work on something like this. </blockquote> I don't think this is that PG specific, as explained above. <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Wed, 11 Apr 2018 19:32:21 -0700 </code></pre> Hi, On 2018-04-12 10:09:16 +1000, Dave Chinner wrote: <blockquote> To pound the broken record: there are many good reasons why Linux filesystem developers have said "you should use direct IO" to the PG devs each time we have this "the kernel doesn't do [complex things PG needs]" discussion. </blockquote> I personally am on board with doing that. But you also gotta recognize that an efficient DIO usage is a metric ton of work, and you need a large amount of differing logic for different platforms. It's just not realistic to do so for every platform. Postgres is developed by a small number of people, isn't VC backed etc. The amount of resources we can throw at something is fairly limited. I'm hoping to work on adding linux DIO support to pg, but I'm sure as hell not going to do be able to do the same on windows (solaris, hpux, aix, ...) etc. And there's cases where that just doesn't help at all. Being able to untar a database from backup / archive / timetravel / whatnot, and then fsyncing the directory tree to make sure it's actually safe, is really not an insane idea. Or even just cp -r ing it, and then starting up a copy of the database. What you're saying is that none of that is doable in a safe way, unless you use special-case DIO using tooling for the whole operation (or at least tools that fsync carefully without ever closing a fd, which certainly isn't the case for cp et al). <blockquote> In this case, robust IO error reporting is easy with DIO. It's one of the reasons most of the high performance database engines are either using or moving to non-blocking AIO+DIO (RWF_NOWAIT) and use O_DSYNC/RWF_DSYNC for integrity-critical IO dispatch. This is also being driven by the availability of high performance, high IOPS solid state storage where buffering in RAM to optimise IO patterns and throughput provides no real performance benefit. Using the AIO+DIO infrastructure ensures errors are reported for the specific write that fails at failure time (i.e. in the aio completion event for the specific IO), yet high IO throughput can be maintained without the application needing it's own threading infrastructure to prevent blocking. This means the application doesn't have to guess where the write error occurred to retry/recover, have to handle async write errors on close(), have to use fsync() to gather write IO errors and then infer where the IO failure was, or require kernels on every supported platform to jump through hoops to try to do exactly the right thing in error conditions for everyone in all circumstances at all times.... </blockquote> Most of that sounds like a good thing to do, but you got to recognize that that's a lot of linux specific code. <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Wed, 11 Apr 2018 19:51:13 -0700 </code></pre> Hi, On 2018-04-11 19:32:21 -0700, Andres Freund wrote: <blockquote> And there's cases where that just doesn't help at all. Being able to untar a database from backup / archive / timetravel / whatnot, and then fsyncing the directory tree to make sure it's actually safe, is really not an insane idea. Or even just cp -r ing it, and then starting up a copy of the database. What you're saying is that none of that is doable in a safe way, unless you use special-case DIO using tooling for the whole operation (or at least tools that fsync carefully without ever closing a fd, which certainly isn't the case for cp et al). </blockquote> And before somebody argues that that's a too small window to trigger the problem realistically: Restoring large databases happens pretty commonly (for new replicas, testcases, or actual fatal issues), takes time, and it's where a lot of storage is actually written to for the first time in a while, so it's far from unlikely to trigger bad block errors or such. <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Wed, 11 Apr 2018 20:02:48 -0700 </code></pre> On Wed, Apr 11, 2018 at 07:17:52PM -0700, Andres Freund wrote: <blockquote> <blockquote> <blockquote> While there's some differing opinions on the referenced postgres thread, the fundamental problem isn't so much that a retry won't fix the problem, it's that we might NEVER see the failure. If writeback happens in the background, encounters an error, undirties the buffer, we will happily carry on because we've never seen that. That's when we're majorly screwed. </blockquote> I think there are two issues here - "fsync() on an fd that was just opened" and "persistent error state (without keeping dirty pages in memory)". If there is background data writeback without an open file descriptor, there is no mechanism for the kernel to return an error to any application which may exist, or may not ever come back. </blockquote> And that's horrible. If I cp a file, and writeback fails in the background, and I then cat that file before restarting, I should be able to see that that failed. Instead of returning something bogus. </blockquote> At the moment, when we open a file, we sample the current state of the writeback error and only report new errors. We could set it to zero instead, and report the most recent error as soon as anything happens which would report an error. That way err = close(open("file")); would report the most recent error. That's not going to be persistent across the data structure for that inode being removed from memory; we'd need filesystem support for persisting that. But maybe it's "good enough" to only support it for recent files. Jeff, what do you think? <hr /> <pre><code>From: "Theodore Y. Ts'o" <tytso@....edu> Date: Thu, 12 Apr 2018 01:09:24 -0400 </code></pre> On Wed, Apr 11, 2018 at 07:32:21PM -0700, Andres Freund wrote: <blockquote> Most of that sounds like a good thing to do, but you got to recognize that that's a lot of linux specific code. </blockquote> I know it's not what PG has chosen, but realistically all of the other major databases and userspace based storage systems have used DIO precisely because it's the way to avoid OS-specific behavior or require OS-specific code. DIO is simple, and pretty much the same everywhere. In contrast, the exact details of how buffered I/O workrs can be quite different on different OS's. This is especially true if you take performance related details (e.g., the cleaning algorithm, how pages get chosen for eviction, etc.) As I read the PG-hackers thread, I thought I saw acknowledgement that some of the behaviors you don't like with Linux also show up on other Unix or Unix-like systems? <hr /> <pre><code>From: "Theodore Y. Ts'o" <tytso@....edu> Date: Thu, 12 Apr 2018 01:34:45 -0400 </code></pre> On Wed, Apr 11, 2018 at 07:17:52PM -0700, Andres Freund wrote: <blockquote> <blockquote> If there is background data writeback without an open file descriptor, there is no mechanism for the kernel to return an error to any application which may exist, or may not ever come back. </blockquote> And that's horrible. If I cp a file, and writeback fails in the background, and I then cat that file before restarting, I should be able to see that that failed. Instead of returning something bogus. </blockquote> If there is no open file descriptor, and in many cases, no process (because it has already exited), it may be horrible, but what the h*ll else do you expect the OS to do? The solution we use at Google is that we watch for I/O errors using a completely different process that is responsible for monitoring machine health. It used to scrape dmesg, but we now arrange to have I/O errors get sent via a netlink channel to the machine health monitoring daemon. If it detects errors on a particular hard drive, it tells the cluster file system to stop using that disk, and to reconstruct from erasure code all of the data chunks on that disk onto other disks in the cluster. We then run a series of disk diagnostics to make sure we find all of the bad sectors (every often, where there is one bad sector, there are several more waiting to be found), and then afterwards, put the disk back into service. By making it be a separate health monitoring process, we can have HDD experts write much more sophisticated code that can ask the disk firmware for more information (e.g., SMART, the grown defect list), do much more careful scrubbing of the disk media, etc., before returning the disk back to service. <blockquote> <blockquote> Everyone already has too much work to do, so you need to find someone who has an interest in fixing this (IMHO very peculiar) use case. If PG developers want to add a tunable "keep dirty pages in RAM on IO failure", I don't think that it would be too hard for someone to do. It might be harder to convince some of the kernel maintainers to accept it, and I've been on the losing side of that battle more than once. However, like everything you don't pay for, you can't require someone else to do this for you. It wouldn't hurt to see if Jeff Layton, who wrote the errseq patches, would be interested to work on something like this. </blockquote> I don't think this is that PG specific, as explained above. </blockquote> The reality is that recovering from disk errors is tricky business, and I very much doubt most userspace applications, including distro package managers, are going to want to engineer for trying to detect and recover from disk errors. If that were true, then Red Hat and/or SuSE have kernel engineers, and they would have implemented everything everything on your wish list. They haven't, and that should tell you something. The other reality is that once a disk starts developing errors, in reality you will probably need to take the disk off-line, scrub it to find any other media errors, and there's a good chance you'll need to rewrite bad sectors (incluing some which are on top of file system metadata, so you probably will have to run fsck or reformat the whole file system). I certainly don't think it's realistic to assume adding lots of sophistication to each and every userspace program. If you have tens or hundreds of thousands of disk drives, then you will need to do tsomething automated, but I claim that you really don't want to smush all of that detailed exception handling and HDD repair technology into each database or cluster file system component. It really needs to be done in a separate health-monitor and machine-level management system. <hr /> <pre><code>From: Dave Chinner <david@...morbit.com> Date: Thu, 12 Apr 2018 15:45:36 +1000 </code></pre> On Wed, Apr 11, 2018 at 07:32:21PM -0700, Andres Freund wrote: <blockquote> Hi, On 2018-04-12 10:09:16 +1000, Dave Chinner wrote: <blockquote> To pound the broken record: there are many good reasons why Linux filesystem developers have said "you should use direct IO" to the PG devs each time we have this "the kernel doesn't do " discussion. </blockquote> I personally am on board with doing that. But you also gotta recognize that an efficient DIO usage is a metric ton of work, and you need a large amount of differing logic for different platforms. It's just not realistic to do so for every platform. Postgres is developed by a small number of people, isn't VC backed etc. The amount of resources we can throw at something is fairly limited. I'm hoping to work on adding linux DIO support to pg, but I'm sure as hell not going to do be able to do the same on windows (solaris, hpux, aix, ...) etc. And there's cases where that just doesn't help at all. Being able to untar a database from backup / archive / timetravel / whatnot, and then fsyncing the directory tree to make sure it's actually safe, is really not an insane idea. </blockquote> Yes it is. This is what syncfs() is for - making sure a large amount of of data and metadata spread across many files and subdirectories in a single filesystem is pushed to stable storage in the most efficient manner possible. <blockquote> Or even just cp -r ing it, and then starting up a copy of the database. What you're saying is that none of that is doable in a safe way, unless you use special-case DIO using tooling for the whole operation (or at least tools that fsync carefully without ever closing a fd, which certainly isn't the case for cp et al). </blockquote> No, Just saying fsyncing individual files and directories is about the most inefficient way you could possible go about doing this. <hr /> <pre><code>From: Lukas Czerner <lczerner@...hat.com> Date: Thu, 12 Apr 2018 12:19:26 +0200 </code></pre> On Wed, Apr 11, 2018 at 07:32:21PM -0700, Andres Freund wrote: <blockquote> And there's cases where that just doesn't help at all. Being able to untar a database from backup / archive / timetravel / whatnot, and then fsyncing the directory tree to make sure it's actually safe, is really not an insane idea. Or even just cp -r ing it, and then starting up a copy of the database. What you're saying is that none of that is doable in a safe way, unless you use special-case DIO using tooling for the whole operation (or at least tools that fsync carefully without ever closing a fd, which certainly isn't the case for cp et al). </blockquote> Does not seem like a problem to me, just checksum the thing if you really need to be extra safe. You should probably be doing it anyway if you backup / archive / timetravel / whatnot. <hr /> <pre><code>From: Jeff Layton <jlayton@...hat.com> Date: Thu, 12 Apr 2018 07:09:14 -0400 </code></pre> On Wed, 2018-04-11 at 20:02 -0700, Matthew Wilcox wrote: <blockquote> On Wed, Apr 11, 2018 at 07:17:52PM -0700, Andres Freund wrote: <blockquote> <blockquote> <blockquote> While there's some differing opinions on the referenced postgres thread, the fundamental problem isn't so much that a retry won't fix the problem, it's that we might NEVER see the failure. If writeback happens in the background, encounters an error, undirties the buffer, we will happily carry on because we've never seen that. That's when we're majorly screwed. </blockquote> I think there are two issues here - "fsync() on an fd that was just opened" and "persistent error state (without keeping dirty pages in memory)". If there is background data writeback without an open file descriptor, there is no mechanism for the kernel to return an error to any application which may exist, or may not ever come back. </blockquote> And that's horrible. If I cp a file, and writeback fails in the background, and I then cat that file before restarting, I should be able to see that that failed. Instead of returning something bogus. </blockquote> </blockquote> What are you expecting to happen in this case? Are you expecting a read error due to a writeback failure? Or are you just saying that we should be invalidating pages that failed to be written back, so that they can be re-read? <blockquote> At the moment, when we open a file, we sample the current state of the writeback error and only report new errors. We could set it to zero instead, and report the most recent error as soon as anything happens which would report an error. That way err = close(open("file")); would report the most recent error. That's not going to be persistent across the data structure for that inode being removed from memory; we'd need filesystem support for persisting that. But maybe it's "good enough" to only support it for recent files. Jeff, what do you think? </blockquote> I hate it :). We could do that, but....yecchhhh. Reporting errors only in the case where the inode happened to stick around in the cache seems too unreliable for real-world usage, and might be problematic for some use cases. I'm also not sure it would really be helpful. I think the crux of the matter here is not really about error reporting, per-se. I asked this at LSF last year, and got no real answer: When there is a writeback error, what should be done with the dirty page(s)? Right now, we usually just mark them clean and carry on. Is that the right thing to do? One possibility would be to invalidate the range that failed to be written (or the whole file) and force the pages to be faulted in again on the next access. It could be surprising for some applications to not see the results of their writes on a subsequent read after such an event. Maybe that's ok in the face of a writeback error though? IDK. <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Thu, 12 Apr 2018 04:19:48 -0700 </code></pre> On Thu, Apr 12, 2018 at 07:09:14AM -0400, Jeff Layton wrote: <blockquote> On Wed, 2018-04-11 at 20:02 -0700, Matthew Wilcox wrote: <blockquote> At the moment, when we open a file, we sample the current state of the writeback error and only report new errors. We could set it to zero instead, and report the most recent error as soon as anything happens which would report an error. That way err = close(open("file")); would report the most recent error. That's not going to be persistent across the data structure for that inode being removed from memory; we'd need filesystem support for persisting that. But maybe it's "good enough" to only support it for recent files. Jeff, what do you think? </blockquote> I hate it :). We could do that, but....yecchhhh. Reporting errors only in the case where the inode happened to stick around in the cache seems too unreliable for real-world usage, and might be problematic for some use cases. I'm also not sure it would really be helpful. </blockquote> Yeah, it's definitely half-arsed. We could make further changes to improve the situation, but they'd have wider impact. For example, we can tell if the error has been sampled by any existing fd, so we could bias our inode reaping to have inodes with unreported errors stick around in the cache for longer. <blockquote> I think the crux of the matter here is not really about error reporting, per-se. I asked this at LSF last year, and got no real answer: When there is a writeback error, what should be done with the dirty page(s)? Right now, we usually just mark them clean and carry on. Is that the right thing to do? </blockquote> I suspect it isn't. If there's a transient error then we should reattempt the write. OTOH if the error is permanent then reattempting the write isn't going to do any good and it's just going to cause the drive to go through the whole error handling dance again. And what do we do if we're low on memory and need these pages back to avoid going OOM? There's a lot of options here, all of them bad in one situation or another. <blockquote> One possibility would be to invalidate the range that failed to be written (or the whole file) and force the pages to be faulted in again on the next access. It could be surprising for some applications to not see the results of their writes on a subsequent read after such an event. Maybe that's ok in the face of a writeback error though? IDK. </blockquote> I don't know either. It'd force the application to face up to the fact that the data is gone immediately rather than only finding it out after a reboot. Again though that might cause more problems than it solves. It's hard to know what the right thing to do is. <hr /> <pre><code>From: Jeff Layton <jlayton@...hat.com> Date: Thu, 12 Apr 2018 07:24:12 -0400 </code></pre> On Thu, 2018-04-12 at 15:45 +1000, Dave Chinner wrote: <blockquote> On Wed, Apr 11, 2018 at 07:32:21PM -0700, Andres Freund wrote: <blockquote> Hi, On 2018-04-12 10:09:16 +1000, Dave Chinner wrote: <blockquote> To pound the broken record: there are many good reasons why Linux filesystem developers have said "you should use direct IO" to the PG devs each time we have this "the kernel doesn't do " discussion. </blockquote> I personally am on board with doing that. But you also gotta recognize that an efficient DIO usage is a metric ton of work, and you need a large amount of differing logic for different platforms. It's just not realistic to do so for every platform. Postgres is developed by a small number of people, isn't VC backed etc. The amount of resources we can throw at something is fairly limited. I'm hoping to work on adding linux DIO support to pg, but I'm sure as hell not going to do be able to do the same on windows (solaris, hpux, aix, ...) etc. And there's cases where that just doesn't help at all. Being able to untar a database from backup / archive / timetravel / whatnot, and then fsyncing the directory tree to make sure it's actually safe, is really not an insane idea. </blockquote> Yes it is. This is what syncfs() is for - making sure a large amount of of data and metadata spread across many files and subdirectories in a single filesystem is pushed to stable storage in the most efficient manner possible. </blockquote> Just note that the error return from syncfs is somewhat iffy. It doesn't necessarily return an error when one inode fails to be written back. I think it mainly returns errors when you get a metadata writeback error. <blockquote> <blockquote> Or even just cp -r ing it, and then starting up a copy of the database. What you're saying is that none of that is doable in a safe way, unless you use special-case DIO using tooling for the whole operation (or at least tools that fsync carefully without ever closing a fd, which certainly isn't the case for cp et al). </blockquote> No, Just saying fsyncing individual files and directories is about the most inefficient way you could possible go about doing this. </blockquote> You can still use syncfs but what you'd probably have to do is call syncfs while you still hold all of the fd's open, and then fsync each one afterward to ensure that they all got written back properly. That should work as you'd expect. <hr /> <pre><code>From: Dave Chinner <david@...morbit.com> Date: Thu, 12 Apr 2018 22:01:22 +1000 </code></pre> On Thu, Apr 12, 2018 at 07:09:14AM -0400, Jeff Layton wrote: <blockquote> When there is a writeback error, what should be done with the dirty page(s)? Right now, we usually just mark them clean and carry on. Is that the right thing to do? </blockquote> There isn't a right thing. Whatever we do will be wrong for someone. <blockquote> One possibility would be to invalidate the range that failed to be written (or the whole file) and force the pages to be faulted in again on the next access. It could be surprising for some applications to not see the results of their writes on a subsequent read after such an event. </blockquote> Not to mention a POSIX IO ordering violation. Seeing stale data after a "successful" write is simply not allowed. <blockquote> Maybe that's ok in the face of a writeback error though? IDK. </blockquote> No matter what we do for async writeback error handling, it will be slightly different from filesystem to filesystem, not to mention OS to OS. The is no magic bullet here, so I'm not sure we should worry too much. There's direct IO for anyone who cares that need to know about the completion status of every single write IO.... <hr /> <pre><code>From: "Theodore Y. Ts'o" <tytso@....edu> Date: Thu, 12 Apr 2018 11:16:46 -0400 </code></pre> On Thu, Apr 12, 2018 at 10:01:22PM +1000, Dave Chinner wrote: <blockquote> On Thu, Apr 12, 2018 at 07:09:14AM -0400, Jeff Layton wrote: <blockquote> When there is a writeback error, what should be done with the dirty page(s)? Right now, we usually just mark them clean and carry on. Is that the right thing to do? </blockquote> There isn't a right thing. Whatever we do will be wrong for someone. </blockquote> That's the problem. The best that could be done (and it's not enough) would be to have a mode which does with the PG folks want (or what they think they want). It seems what they want is to have an error result in the page being marked clean. When they discover the outcome (OOM-city and the unability to unmount a file system on a failed drive), then they will complain to us again, at which point we can tell them that want they really want is another variation on O_PONIES, and welcome to the real world and real life. Which is why, even if they were to pay someone to implement what they want, I'm not sure we would want to accept it upstream --- or distro's might consider it a support nightmare, and refuse to allow that mode to be enabled on enterprise distro's. But at least, it will have been some PG-based company who will have implemented it, so they're not wasting other people's time or other people's resources... We could try to get something like what Google is doing upstream, which is to have the I/O errors sent to userspace via a netlink channel (without changing anything else about how buffered writeback is handled in the face of errors). Then userspace applications could switch to Direct I/O like all of the other really serious userspace storage solutions I'm aware of, and then someone could try to write some kind of HDD health monitoring system that tries to do the right thing when a disk is discovered to have developed some media errors or something more serious (e.g., a head failure). That plus some kind of RAID solution is I think the only thing which is really realistic for a typical PG site. It's certainly that's what I would do if I didn't decide to use a hosted cloud solution, such as Cloud SQL for Postgres, and let someone else solve the really hard problems of dealing with real-world HDD failures. :-) <hr /> <pre><code>From: Jeff Layton <jlayton@...hat.com> Date: Thu, 12 Apr 2018 11:08:50 -0400 </code></pre> On Thu, 2018-04-12 at 22:01 +1000, Dave Chinner wrote: <blockquote> On Thu, Apr 12, 2018 at 07:09:14AM -0400, Jeff Layton wrote: <blockquote> When there is a writeback error, what should be done with the dirty page(s)? Right now, we usually just mark them clean and carry on. Is that the right thing to do? </blockquote> There isn't a right thing. Whatever we do will be wrong for someone. <blockquote> One possibility would be to invalidate the range that failed to be written (or the whole file) and force the pages to be faulted in again on the next access. It could be surprising for some applications to not see the results of their writes on a subsequent read after such an event. </blockquote> Not to mention a POSIX IO ordering violation. Seeing stale data after a "successful" write is simply not allowed. </blockquote> I'm not so sure here, given that we're dealing with an error condition. Are we really obligated not to allow any changes to pages that we can't write back? Given that the pages are clean after these failures, we aren't doing this even today: Suppose we're unable to do writes but can do reads vs. the backing store. After a wb failure, the page has the dirty bit cleared. If it gets kicked out of the cache before the read occurs, it'll have to be faulted back in. Poof -- your write just disappeared. That can even happen before you get the chance to call fsync, so even a write()+read()+fsync() is not guaranteed to be safe in this regard today, given sufficient memory pressure. I think the current situation is fine from a "let's not OOM at all costs" standpoint, but not so good for application predictability. We should really consider ways to do better here. <blockquote> <blockquote> Maybe that's ok in the face of a writeback error though? IDK. </blockquote> No matter what we do for async writeback error handling, it will be slightly different from filesystem to filesystem, not to mention OS to OS. The is no magic bullet here, so I'm not sure we should worry too much. There's direct IO for anyone who cares that need to know about the completion status of every single write IO.... </blockquote> I think we we have an opportunity here to come up with better defined and hopefully more useful behavior for buffered I/O in the face of writeback errors. The first step would be to hash out what we'd want it to look like. Maybe we need a plenary session at LSF/MM? <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Thu, 12 Apr 2018 12:46:27 -0700 </code></pre> Hi, On 2018-04-12 12:19:26 +0200, Lukas Czerner wrote: <blockquote> On Wed, Apr 11, 2018 at 07:32:21PM -0700, Andres Freund wrote: <blockquote> And there's cases where that just doesn't help at all. Being able to untar a database from backup / archive / timetravel / whatnot, and then fsyncing the directory tree to make sure it's actually safe, is really not an insane idea. Or even just cp -r ing it, and then starting up a copy of the database. What you're saying is that none of that is doable in a safe way, unless you use special-case DIO using tooling for the whole operation (or at least tools that fsync carefully without ever closing a fd, which certainly isn't the case for cp et al). </blockquote> Does not seem like a problem to me, just checksum the thing if you really need to be extra safe. You should probably be doing it anyway if you backup / archive / timetravel / whatnot. </blockquote> That doesn't really help, unless you want to sync() and then re-read all the data to make sure it's the same. Rereading multi-TB backups just to know whether there was an error that the OS knew about isn't particularly fun. Without verifying after sync it's not going to improve the situation measurably, you're still only going to discover that $data isn't available when it's needed. What you're saying here is that there's no way to use standard linux tools to manipulate files and know whether it failed, without filtering kernel logs for IO errors. Or am I missing something? <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Thu, 12 Apr 2018 12:55:36 -0700 </code></pre> Hi, On 2018-04-12 01:34:45 -0400, Theodore Y. Ts'o wrote: <blockquote> The solution we use at Google is that we watch for I/O errors using a completely different process that is responsible for monitoring machine health. It used to scrape dmesg, but we now arrange to have I/O errors get sent via a netlink channel to the machine health monitoring daemon. </blockquote> Any pointers to that the underling netlink mechanism? If we can force postgres to kill itself when such an error is detected (via a dedicated monitoring process), I'd personally be happy enough. It'd be nicer if we could associate that knowledge with particular filesystems etc (which'd possibly hard through dm etc?), but this'd be much better than nothing. <blockquote> The reality is that recovering from disk errors is tricky business, and I very much doubt most userspace applications, including distro package managers, are going to want to engineer for trying to detect and recover from disk errors. If that were true, then Red Hat and/or SuSE have kernel engineers, and they would have implemented everything everything on your wish list. They haven't, and that should tell you something. </blockquote> The problem really isn't about recovering from disk errors. Knowing about them is the crucial part. We do not want to give back clients the information that an operation succeeded, when it actually didn't. There could be improvements above that, but as long as it's guaranteed that "we" get the error (rather than just some kernel log we don't have access to, which looks different due to config etc), it's ok. We can throw our hands up in the air and give up. <blockquote> The other reality is that once a disk starts developing errors, in reality you will probably need to take the disk off-line, scrub it to find any other media errors, and there's a good chance you'll need to rewrite bad sectors (incluing some which are on top of file system metadata, so you probably will have to run fsck or reformat the whole file system). I certainly don't think it's realistic to assume adding lots of sophistication to each and every userspace program. If you have tens or hundreds of thousands of disk drives, then you will need to do tsomething automated, but I claim that you really don't want to smush all of that detailed exception handling and HDD repair technology into each database or cluster file system component. It really needs to be done in a separate health-monitor and machine-level management system. </blockquote> Yea, agreed on all that. I don't think anybody actually involved in postgres wants to do anything like that. Seems far outside of postgres' remit. <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Thu, 12 Apr 2018 13:13:22 -0700 </code></pre> Hi, On 2018-04-12 11:16:46 -0400, Theodore Y. Ts'o wrote: <blockquote> That's the problem. The best that could be done (and it's not enough) would be to have a mode which does with the PG folks want (or what they think they want). It seems what they want is to have an error result in the page being marked clean. When they discover the outcome (OOM-city and the unability to unmount a file system on a failed drive), then they will complain to us again, at which point we can tell them that want they really want is another variation on O_PONIES, and welcome to the real world and real life. </blockquote> I think a per-file or even per-blockdev/fs error state that'd be returned by fsync() would be more than sufficient. I don't see that that'd realistically would trigger OOM or the inability to unmount a filesystem. If the drive is entirely gone there's obviously no point in keeping per-file information around, so per-blockdev/fs information suffices entirely to return an error on fsync (which at least on ext4 appears to happen if the underlying blockdev is gone). Have fun making up things we want, but I'm not sure it's particularly productive. <blockquote> Which is why, even if they were to pay someone to implement what they want, I'm not sure we would want to accept it upstream --- or distro's might consider it a support nightmare, and refuse to allow that mode to be enabled on enterprise distro's. But at least, it will have been some PG-based company who will have implemented it, so they're not wasting other people's time or other people's resources... </blockquote> Well, that's why I'm discussing here so we can figure out what's acceptable before considering wasting money and revew cycles doing or paying somebody to do some crazy useless shit. <blockquote> We could try to get something like what Google is doing upstream, which is to have the I/O errors sent to userspace via a netlink channel (without changing anything else about how buffered writeback is handled in the face of errors). </blockquote> Ah, darn. After you'd mentioned that in an earlier mail I'd hoped that'd be upstream. And yes, that'd be perfect. <blockquote> Then userspace applications could switch to Direct I/O like all of the other really serious userspace storage solutions I'm aware of, and then someone could try to write some kind of HDD health monitoring system that tries to do the right thing when a disk is discovered to have developed some media errors or something more serious (e.g., a head failure). That plus some kind of RAID solution is I think the only thing which is really realistic for a typical PG site. </blockquote> As I said earlier, I think there's good reason to move to DIO for postgres. But to keep that performant is going to need some serious work. But afaict such a solution wouldn't really depend on applications using DIO or not. Before finishing a checkpoint (logging it persistently and allowing to throw older data away), we could check if any errors have been reported and give up if there have been any. And after starting postgres on a directory restored from backup using $tool, we can fsync the directory recursively, check for such errors, and give up if there've been any. <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Thu, 12 Apr 2018 13:24:57 -0700 </code></pre> On 2018-04-12 07:09:14 -0400, Jeff Layton wrote: <blockquote> On Wed, 2018-04-11 at 20:02 -0700, Matthew Wilcox wrote: <blockquote> On Wed, Apr 11, 2018 at 07:17:52PM -0700, Andres Freund wrote: <blockquote> <blockquote> <blockquote> While there's some differing opinions on the referenced postgres thread, the fundamental problem isn't so much that a retry won't fix the problem, it's that we might NEVER see the failure. If writeback happens in the background, encounters an error, undirties the buffer, we will happily carry on because we've never seen that. That's when we're majorly screwed. </blockquote> I think there are two issues here - "fsync() on an fd that was just opened" and "persistent error state (without keeping dirty pages in memory)". If there is background data writeback without an open file descriptor, there is no mechanism for the kernel to return an error to any application which may exist, or may not ever come back. </blockquote> And that's horrible. If I cp a file, and writeback fails in the background, and I then cat that file before restarting, I should be able to see that that failed. Instead of returning something bogus. </blockquote> </blockquote> What are you expecting to happen in this case? Are you expecting a read error due to a writeback failure? Or are you just saying that we should be invalidating pages that failed to be written back, so that they can be re-read? </blockquote> Yes, I'd hope for a read error after a writeback failure. I think that's sane behaviour. But I don't really care that much. At the very least some way to know that such a failure occurred from userland without having to parse the kernel log. As far as I understand, neither sync(2) (and thus sync(1)) nor syncfs(2) is guaranteed to report an error if it was encountered by writeback in the background. If that's indeed true for syncfs(2), even if the fd has been opened before (which I can see how it could happen from an implementation POV, nothing would associate a random FD with failures on different files), it's really impossible to detect this stuff from userland without text parsing. Even if it'd were just a perf-fs /sys/$something file that'd return the current count of unreported errors in a filesystem independent way, it'd be better than what we have right now. <pre><code>1) figure out /sys/$whatnot $directory belongs to 2) oldcount=$(cat /sys/$whatnot/unreported_errors) 3) filesystem operations in $directory 4) sync;sync; 5) newcount=$(cat /sys/$whatnot/unreported_errors) 6) test "$oldcount" -eq "$newcount" || die-with-horrible-message </code></pre> Isn't beautiful to script, but it's also not absolutely terrible. <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Thu, 12 Apr 2018 13:28:30 -0700 </code></pre> On Thu, Apr 12, 2018 at 01:13:22PM -0700, Andres Freund wrote: <blockquote> I think a per-file or even per-blockdev/fs error state that'd be returned by fsync() would be more than sufficient. </blockquote> Ah; this was my suggestion to Jeff on IRC. That we add a per-superblock wb_err and then allow syncfs() to return it. So you'd open an fd on a directory (for example), and call syncfs() which would return -EIO or -ENOSPC if either of those conditions had occurred since you opened the fd. <blockquote> I don't see that that'd realistically would trigger OOM or the inability to unmount a filesystem. </blockquote> Ted's referring to the current state of affairs where the writeback error is held in the inode; if we can't evict the inode because it's holding the error indicator, that can send us OOM. If instead we transfer the error indicator to the superblock, then there's no problem. <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Thu, 12 Apr 2018 14:11:45 -0700 </code></pre> On 2018-04-12 07:24:12 -0400, Jeff Layton wrote: <blockquote> On Thu, 2018-04-12 at 15:45 +1000, Dave Chinner wrote: <blockquote> On Wed, Apr 11, 2018 at 07:32:21PM -0700, Andres Freund wrote: <blockquote> Hi, On 2018-04-12 10:09:16 +1000, Dave Chinner wrote: > To pound the broken record: there are many good reasons why Linux > filesystem developers have said "you should use direct IO" to the PG > devs each time we have this "the kernel doesn't do PG needs>" discussion. I personally am on board with doing that. But you also gotta recognize that an efficient DIO usage is a metric ton of work, and you need a large amount of differing logic for different platforms. It's just not realistic to do so for every platform. Postgres is developed by a small number of people, isn't VC backed etc. The amount of resources we can throw at something is fairly limited. I'm hoping to work on adding linux DIO support to pg, but I'm sure as hell not going to do be able to do the same on windows (solaris, hpux, aix, ...) etc. And there's cases where that just doesn't help at all. Being able to untar a database from backup / archive / timetravel / whatnot, and then fsyncing the directory tree to make sure it's actually safe, is really not an insane idea. </blockquote> Yes it is. This is what syncfs() is for - making sure a large amount of of data and metadata spread across many files and subdirectories in a single filesystem is pushed to stable storage in the most efficient manner possible. </blockquote> </blockquote> syncfs isn't standardized, it operates on an entire filesystem (thus writing out unnecessary stuff), it has no meaningful documentation of it's return codes. Yes, using syncfs() might better performancewise, but it doesn't seem like it actually solves anything, performance aside: <blockquote> Just note that the error return from syncfs is somewhat iffy. It doesn't necessarily return an error when one inode fails to be written back. I think it mainly returns errors when you get a metadata writeback error. You can still use syncfs but what you'd probably have to do is call syncfs while you still hold all of the fd's open, and then fsync each one afterward to ensure that they all got written back properly. That should work as you'd expect. </blockquote> Which again doesn't allow one to use any non-bespoke tooling (like tar or whatnot). And it means you'll have to call syncfs() every few hundred files, because you'll obviously run into filehandle limitations. <hr /> <pre><code>From: Jeff Layton <jlayton@...hat.com> Date: Thu, 12 Apr 2018 17:14:54 -0400 </code></pre> On Thu, 2018-04-12 at 13:28 -0700, Matthew Wilcox wrote: <blockquote> On Thu, Apr 12, 2018 at 01:13:22PM -0700, Andres Freund wrote: <blockquote> I think a per-file or even per-blockdev/fs error state that'd be returned by fsync() would be more than sufficient. </blockquote> Ah; this was my suggestion to Jeff on IRC. That we add a per- superblock wb_err and then allow syncfs() to return it. So you'd open an fd on a directory (for example), and call syncfs() which would return -EIO or -ENOSPC if either of those conditions had occurred since you opened the fd. </blockquote> Not a bad idea and shouldn't be too costly. mapping_set_error could flag the superblock one before or after the one in the mapping. We'd need to define what happens if you interleave fsync and syncfs calls on the same inode though. How do we handle file->f_wb_err in that case? Would we need a second field in struct file to act as the per-sb error cursor? <blockquote> <blockquote> I don't see that that'd realistically would trigger OOM or the inability to unmount a filesystem. </blockquote> Ted's referring to the current state of affairs where the writeback error is held in the inode; if we can't evict the inode because it's holding the error indicator, that can send us OOM. If instead we transfer the error indicator to the superblock, then there's no problem. </blockquote> <hr /> <pre><code>From: "Theodore Y. Ts'o" <tytso@....edu> Date: Thu, 12 Apr 2018 17:21:44 -0400 </code></pre> On Thu, Apr 12, 2018 at 01:28:30PM -0700, Matthew Wilcox wrote: <blockquote> On Thu, Apr 12, 2018 at 01:13:22PM -0700, Andres Freund wrote: <blockquote> I think a per-file or even per-blockdev/fs error state that'd be returned by fsync() would be more than sufficient. </blockquote> Ah; this was my suggestion to Jeff on IRC. That we add a per-superblock wb_err and then allow syncfs() to return it. So you'd open an fd on a directory (for example), and call syncfs() which would return -EIO or -ENOSPC if either of those conditions had occurred since you opened the fd. </blockquote> When or how would the per-superblock wb_err flag get cleared? Would all subsequent fsync() calls on that file system now return EIO? Or would only all subsequent syncfs() calls return EIO? <blockquote> <blockquote> I don't see that that'd realistically would trigger OOM or the inability to unmount a filesystem. </blockquote> Ted's referring to the current state of affairs where the writeback error is held in the inode; if we can't evict the inode because it's holding the error indicator, that can send us OOM. If instead we transfer the error indicator to the superblock, then there's no problem. </blockquote> Actually, I was referring to the pg-hackers original ask, which was that after an error, all of the dirty pages that couldn't be written out would stay dirty. If it's only as single inode which is pinned in memory with the dirty flag, that's bad, but it's not as bad as pinning all of the memory pages for which there was a failed write. We would still need to invent some mechanism or define some semantic when it would be OK to clear the per-inode flag and let the memory associated with that pinned inode get released, though. <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Thu, 12 Apr 2018 14:24:32 -0700 </code></pre> On Thu, Apr 12, 2018 at 05:21:44PM -0400, Theodore Y. Ts'o wrote: <blockquote> On Thu, Apr 12, 2018 at 01:28:30PM -0700, Matthew Wilcox wrote: <blockquote> On Thu, Apr 12, 2018 at 01:13:22PM -0700, Andres Freund wrote: <blockquote> I think a per-file or even per-blockdev/fs error state that'd be returned by fsync() would be more than sufficient. </blockquote> Ah; this was my suggestion to Jeff on IRC. That we add a per-superblock wb_err and then allow syncfs() to return it. So you'd open an fd on a directory (for example), and call syncfs() which would return -EIO or -ENOSPC if either of those conditions had occurred since you opened the fd. </blockquote> When or how would the per-superblock wb_err flag get cleared? </blockquote> That's not how errseq works, Ted ;-) <blockquote> Would all subsequent fsync() calls on that file system now return EIO? Or would only all subsequent syncfs() calls return EIO? </blockquote> Only ones which occur after the last sampling get reported through this particular file descriptor. <hr /> <pre><code>From: Jeff Layton <jlayton@...hat.com> Date: Thu, 12 Apr 2018 17:27:54 -0400 </code></pre> On Thu, 2018-04-12 at 13:24 -0700, Andres Freund wrote: <blockquote> On 2018-04-12 07:09:14 -0400, Jeff Layton wrote: <blockquote> On Wed, 2018-04-11 at 20:02 -0700, Matthew Wilcox wrote: <blockquote> On Wed, Apr 11, 2018 at 07:17:52PM -0700, Andres Freund wrote: <blockquote> <blockquote> <blockquote> While there's some differing opinions on the referenced postgres thread, the fundamental problem isn't so much that a retry won't fix the problem, it's that we might NEVER see the failure. If writeback happens in the background, encounters an error, undirties the buffer, we will happily carry on because we've never seen that. That's when we're majorly screwed. </blockquote> I think there are two issues here - "fsync() on an fd that was just opened" and "persistent error state (without keeping dirty pages in memory)". If there is background data writeback without an open file descriptor, there is no mechanism for the kernel to return an error to any application which may exist, or may not ever come back. </blockquote> And that's horrible. If I cp a file, and writeback fails in the background, and I then cat that file before restarting, I should be able to see that that failed. Instead of returning something bogus. </blockquote> </blockquote> What are you expecting to happen in this case? Are you expecting a read error due to a writeback failure? Or are you just saying that we should be invalidating pages that failed to be written back, so that they can be re-read? </blockquote> Yes, I'd hope for a read error after a writeback failure. I think that's sane behaviour. But I don't really care that much. </blockquote> I'll have to respectfully disagree. Why should I interpret an error on a read() syscall to mean that writeback failed? Note that the data is still potentially intact. What might make sense, IMO, is to just invalidate the pages that failed to be written back. Then you could potentially do a read to fault them in again (i.e. sync the pagecache and the backing store) and possibly redirty them for another try. Note that you can detect this situation by checking the return code from fsync. It should report the latest error once per file description. <blockquote> At the very least some way to know that such a failure occurred from userland without having to parse the kernel log. As far as I understand, neither sync(2) (and thus sync(1)) nor syncfs(2) is guaranteed to report an error if it was encountered by writeback in the background. If that's indeed true for syncfs(2), even if the fd has been opened before (which I can see how it could happen from an implementation POV, nothing would associate a random FD with failures on different files), it's really impossible to detect this stuff from userland without text parsing. </blockquote> syncfs could use some work. I'm warming to willy's idea to add a per-sb errseq_t. I think that might be a simple way to get better semantics here. Not sure how we want to handle the reporting end yet though... We probably also need to consider how to better track metadata writeback errors (on e.g. ext2). We don't really do that properly at quite yet either. <blockquote> Even if it'd were just a perf-fs /sys/$something file that'd return the current count of unreported errors in a filesystem independent way, it'd be better than what we have right now. 1) figure out /sys/$whatnot $directory belongs to 2) oldcount=$(cat /sys/$whatnot/unreported_errors) 3) filesystem operations in $directory 4) sync;sync; 5) newcount=$(cat /sys/$whatnot/unreported_errors) 6) test "$oldcount" -eq "$newcount" || die-with-horrible-message Isn't beautiful to script, but it's also not absolutely terrible. </blockquote> <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Thu, 12 Apr 2018 14:31:10 -0700 </code></pre> On Thu, Apr 12, 2018 at 05:14:54PM -0400, Jeff Layton wrote: <blockquote> On Thu, 2018-04-12 at 13:28 -0700, Matthew Wilcox wrote: <blockquote> On Thu, Apr 12, 2018 at 01:13:22PM -0700, Andres Freund wrote: <blockquote> I think a per-file or even per-blockdev/fs error state that'd be returned by fsync() would be more than sufficient. </blockquote> Ah; this was my suggestion to Jeff on IRC. That we add a per- superblock wb_err and then allow syncfs() to return it. So you'd open an fd on a directory (for example), and call syncfs() which would return -EIO or -ENOSPC if either of those conditions had occurred since you opened the fd. </blockquote> Not a bad idea and shouldn't be too costly. mapping_set_error could flag the superblock one before or after the one in the mapping. We'd need to define what happens if you interleave fsync and syncfs calls on the same inode though. How do we handle file->f_wb_err in that case? Would we need a second field in struct file to act as the per-sb error cursor? </blockquote> Ooh. I hadn't thought that through. Bleh. I don't want to add a field to struct file for this uncommon case. Maybe O_PATH could be used for this? It gets you a file descriptor on a particular filesystem, so syncfs() is defined, but it can't report a writeback error. So if you open something O_PATH, you can use the file's f_wb_err for the mapping's error cursor. <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Thu, 12 Apr 2018 14:37:56 -0700 </code></pre> On 2018-04-12 17:21:44 -0400, Theodore Y. Ts'o wrote: <blockquote> On Thu, Apr 12, 2018 at 01:28:30PM -0700, Matthew Wilcox wrote: <blockquote> On Thu, Apr 12, 2018 at 01:13:22PM -0700, Andres Freund wrote: <blockquote> I think a per-file or even per-blockdev/fs error state that'd be returned by fsync() would be more than sufficient. </blockquote> Ah; this was my suggestion to Jeff on IRC. That we add a per-superblock wb_err and then allow syncfs() to return it. So you'd open an fd on a directory (for example), and call syncfs() which would return -EIO or -ENOSPC if either of those conditions had occurred since you opened the fd. </blockquote> When or how would the per-superblock wb_err flag get cleared? </blockquote> I don't think unmount + resettable via /sys would be an insane approach. Requiring explicit action to acknowledge data loss isn't a crazy concept. But I think that's something reasonable minds could disagree with. <blockquote> Would all subsequent fsync() calls on that file system now return EIO? Or would only all subsequent syncfs() calls return EIO? </blockquote> If it were tied to syncfs, I wonder if there's a way to have some errseq type logic. Store a per superblock (or whatever equivalent thing) errseq value of errors. For each fd calling syncfs() report the error once, but then store the current value in a separate per-fd field. And if that's considered too weird, only report the errors to fds that have been opened from before the error occurred. I can see writing a tool 'pg_run_and_sync /directo /ries -- command' which opens an fd for each of the filesystems the directories reside on, and calls syncfs() after. That'd allow to use backup/restore tools at least semi safely. <blockquote> <blockquote> <blockquote> I don't see that that'd realistically would trigger OOM or the inability to unmount a filesystem. </blockquote> Ted's referring to the current state of affairs where the writeback error is held in the inode; if we can't evict the inode because it's holding the error indicator, that can send us OOM. If instead we transfer the error indicator to the superblock, then there's no problem. </blockquote> Actually, I was referring to the pg-hackers original ask, which was that after an error, all of the dirty pages that couldn't be written out would stay dirty. </blockquote> Well, it's an open list, everyone can argue. And initially people at first didn't know the OOM explanation, and then it takes some time to revise ones priors :). I think it's a design question that reasonable people can disagree upon (if "hot" removed devices are handled by throwing data away regardless, at least). But as it's clearly not something viable, we can move on to something that can solve the problem. <blockquote> If it's only as single inode which is pinned in memory with the dirty flag, that's bad, but it's not as bad as pinning all of the memory pages for which there was a failed write. We would still need to invent some mechanism or define some semantic when it would be OK to clear the per-inode flag and let the memory associated with that pinned inode get released, though. </blockquote> Yea, I agree that that's not obvious. One way would be to say that it's only automatically cleared when you unlink the file. A bit heavyhanded, but not too crazy. <hr /> <pre><code>From: "Theodore Y. Ts'o" <tytso@....edu> Date: Thu, 12 Apr 2018 17:52:52 -0400 </code></pre> On Thu, Apr 12, 2018 at 12:55:36PM -0700, Andres Freund wrote: <blockquote> Any pointers to that the underling netlink mechanism? If we can force postgres to kill itself when such an error is detected (via a dedicated monitoring process), I'd personally be happy enough. It'd be nicer if we could associate that knowledge with particular filesystems etc (which'd possibly hard through dm etc?), but this'd be much better than nothing. </blockquote> Yeah, sorry, it never got upstreamed. It's not really all that complicated, it was just that there were some other folks who wanted to do something similar, and there was a round of bike-sheddingh several years ago, and nothing ever went upstream. Part of the problem was that our orignial scheme sent up information about file system-level corruption reports --- e.g, those stemming from calls to ext4_error() --- and lots of people had different ideas about how tot get all of the possible information up in some structured format. (Think something like uerf from Digtial's OSF/1.) We did something really simple/stupid. We just sent essentially an ascii test string out the netlink socket. That's because what we were doing before was essentially scraping the output of dmesg (e.g. /dev/kmssg). That's actually probably the simplest thing to do, and it has the advantage that it will work even on ancient enterprise kernels that PG users are likely to want to use. So you will need to implement the dmesg text scraper anyway, and that's probably good enough for most use cases. <blockquote> The problem really isn't about recovering from disk errors. Knowing about them is the crucial part. We do not want to give back clients the information that an operation succeeded, when it actually didn't. There could be improvements above that, but as long as it's guaranteed that "we" get the error (rather than just some kernel log we don't have access to, which looks different due to config etc), it's ok. We can throw our hands up in the air and give up. </blockquote> Right, it's a little challenging because the actual regexp's you would need to use do vary from device driver to device driver. Fortunately nearly everything is a SCSI/SATA device these days, so there isn't that much variability. <blockquote> Yea, agreed on all that. I don't think anybody actually involved in postgres wants to do anything like that. Seems far outside of postgres' remit. </blockquote> Some people on the pg-hackers list were talking about wanting to retry the fsync() and hoping that would cause the write to somehow suceed. It's possible that might help, but it's not likely to be helpful in my experience. <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Thu, 12 Apr 2018 14:53:19 -0700 </code></pre> On 2018-04-12 17:27:54 -0400, Jeff Layton wrote: <blockquote> On Thu, 2018-04-12 at 13:24 -0700, Andres Freund wrote: <blockquote> At the very least some way to know that such a failure occurred from userland without having to parse the kernel log. As far as I understand, neither sync(2) (and thus sync(1)) nor syncfs(2) is guaranteed to report an error if it was encountered by writeback in the background. If that's indeed true for syncfs(2), even if the fd has been opened before (which I can see how it could happen from an implementation POV, nothing would associate a random FD with failures on different files), it's really impossible to detect this stuff from userland without text parsing. </blockquote> syncfs could use some work. </blockquote> It's really too bad that it doesn't have a flags argument. <blockquote> We probably also need to consider how to better track metadata writeback errors (on e.g. ext2). We don't really do that properly at quite yet either. <blockquote> Even if it'd were just a perf-fs /sys/$something file that'd return the current count of unreported errors in a filesystem independent way, it'd be better than what we have right now. 1) figure out /sys/$whatnot $directory belongs to 2) oldcount=$(cat /sys/$whatnot/unreported_errors) 3) filesystem operations in $directory 4) sync;sync; 5) newcount=$(cat /sys/$whatnot/unreported_errors) 6) test "$oldcount" -eq "$newcount" || die-with-horrible-message Isn't beautiful to script, but it's also not absolutely terrible. </blockquote> </blockquote> ext4 seems to have something roughly like that (/sys/fs/ext4/$dev/errors_count), and by my reading it already seems to be incremented from the necessary places. By my reading XFS doesn't seem to have something similar. Wouldn't be bad to standardize... <hr /> <pre><code>From: "Theodore Y. Ts'o" <tytso@....edu> Date: Thu, 12 Apr 2018 17:57:56 -0400 </code></pre> On Thu, Apr 12, 2018 at 02:53:19PM -0700, Andres Freund wrote: <blockquote> <blockquote> <blockquote> Isn't beautiful to script, but it's also not absolutely terrible. </blockquote> </blockquote> ext4 seems to have something roughly like that (/sys/fs/ext4/$dev/errors_count), and by my reading it already seems to be incremented from the necessary places. </blockquote> This is only for file system inconsistencies noticed by the kernel. We don't bump that count for data block I/O errors. The same idea could be used on a block device level. It would be pretty simple to maintain a counter for I/O errors, and when the last error was detected on a particular device. You could evne break out and track read errors and write errors eparately if that would be useful. If you don't care what block was bad, but just that some I/O error had happened, a counter is definitely the simplest approach, and less hair to implemnet and use than something like a netlink channel or scraping dmesg.... <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Thu, 12 Apr 2018 15:03:59 -0700 </code></pre> Hi, On 2018-04-12 17:52:52 -0400, Theodore Y. Ts'o wrote: <blockquote> We did something really simple/stupid. We just sent essentially an ascii test string out the netlink socket. That's because what we were doing before was essentially scraping the output of dmesg (e.g. /dev/kmssg). That's actually probably the simplest thing to do, and it has the advantage that it will work even on ancient enterprise kernels that PG users are likely to want to use. So you will need to implement the dmesg text scraper anyway, and that's probably good enough for most use cases. </blockquote> The worst part of that is, as you mention below, needing to handle a lot of different error message formats. I guess it's reasonable enough if you control your hardware, but no such luck. Aren't there quite realistic scenarios where one could miss kmsg style messages due to it being a ringbuffer? <blockquote> Right, it's a little challenging because the actual regexp's you would need to use do vary from device driver to device driver. Fortunately nearly everything is a SCSI/SATA device these days, so there isn't that much variability. </blockquote> There's also SAN / NAS type stuff - not all of that presents as a SCSI/SATA device, right? <blockquote> <blockquote> Yea, agreed on all that. I don't think anybody actually involved in postgres wants to do anything like that. Seems far outside of postgres' remit. </blockquote> Some people on the pg-hackers list were talking about wanting to retry the fsync() and hoping that would cause the write to somehow suceed. It's possible that might help, but it's not likely to be helpful in my experience. </blockquote> Depends on the type of error and storage. ENOSPC, especially over NFS, has some reasonable chances of being cleared up. And for networked block storage it's also not impossible to think of scenarios where that'd work for EIO. But I think besides hope of clearing up itself, it has the advantage that it trivially can give some feedback to the user. The user'll get back strerror(ENOSPC) with some decent SQL error code, which'll hopefully cause them to investigate (well, once monitoring detects high error rates). It's much nicer for the user to type COMMIT; get an appropriate error back etc, than if the database just commits suicide. <hr /> <pre><code>From: Dave Chinner <david@...morbit.com> Date: Fri, 13 Apr 2018 08:44:04 +1000 </code></pre> On Thu, Apr 12, 2018 at 11:08:50AM -0400, Jeff Layton wrote: <blockquote> On Thu, 2018-04-12 at 22:01 +1000, Dave Chinner wrote: <blockquote> On Thu, Apr 12, 2018 at 07:09:14AM -0400, Jeff Layton wrote: <blockquote> When there is a writeback error, what should be done with the dirty page(s)? Right now, we usually just mark them clean and carry on. Is that the right thing to do? </blockquote> There isn't a right thing. Whatever we do will be wrong for someone. <blockquote> One possibility would be to invalidate the range that failed to be written (or the whole file) and force the pages to be faulted in again on the next access. It could be surprising for some applications to not see the results of their writes on a subsequent read after such an event. </blockquote> Not to mention a POSIX IO ordering violation. Seeing stale data after a "successful" write is simply not allowed. </blockquote> I'm not so sure here, given that we're dealing with an error condition. Are we really obligated not to allow any changes to pages that we can't write back? </blockquote> Posix says this about write(): <pre><code> After a write() to a regular file has successfully returned: Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified. </code></pre> IOWs, even if there is a later error, we told the user the write was successful, and so according to POSIX we are not allowed to wind back the data to what it was before the write() occurred. <blockquote> Given that the pages are clean after these failures, we aren't doing this even today: Suppose we're unable to do writes but can do reads vs. the backing store. After a wb failure, the page has the dirty bit cleared. If it gets kicked out of the cache before the read occurs, it'll have to be faulted back in. Poof -- your write just disappeared. </blockquote> Yes - I was pointing out what the specification we supposedly conform to says about this behaviour, not that our current behaviour conforms to the spec. Indeed, have you even noticed xfs_aops_discard_page() and it's surrounding context on page writeback submission errors? To save you looking, XFS will trash the page contents completely on a filesystem level ->writepage error. It doesn't mark them "clean", doesn't attempt to redirty and rewrite them - it clears the uptodate state and may invalidate it completely. IOWs, the data written "sucessfully" to the cached page is now gone. It will be re-read from disk on the next read() call, in direct violation of the above POSIX requirements. This is my point: we've done that in XFS knowing that we violate POSIX specifications in this specific corner case - it's the lesser of many evils we have to chose between. Hence if we chose to encode that behaviour as the general writeback IO error handling algorithm, then it needs to done with the knowledge it is a specification violation. Not to mention be documented as a POSIX violation in the various relevant man pages and that this is how all filesystems will behave on async writeback error..... <hr /> <pre><code>From: Jeff Layton <jlayton@...hat.com> Date: Fri, 13 Apr 2018 08:56:38 -0400 </code></pre> On Thu, 2018-04-12 at 14:31 -0700, Matthew Wilcox wrote: <blockquote> On Thu, Apr 12, 2018 at 05:14:54PM -0400, Jeff Layton wrote: <blockquote> On Thu, 2018-04-12 at 13:28 -0700, Matthew Wilcox wrote: <blockquote> On Thu, Apr 12, 2018 at 01:13:22PM -0700, Andres Freund wrote: <blockquote> I think a per-file or even per-blockdev/fs error state that'd be returned by fsync() would be more than sufficient. </blockquote> Ah; this was my suggestion to Jeff on IRC. That we add a per- superblock wb_err and then allow syncfs() to return it. So you'd open an fd on a directory (for example), and call syncfs() which would return -EIO or -ENOSPC if either of those conditions had occurred since you opened the fd. </blockquote> Not a bad idea and shouldn't be too costly. mapping_set_error could flag the superblock one before or after the one in the mapping. We'd need to define what happens if you interleave fsync and syncfs calls on the same inode though. How do we handle file->f_wb_err in that case? Would we need a second field in struct file to act as the per-sb error cursor? </blockquote> Ooh. I hadn't thought that through. Bleh. I don't want to add a field to struct file for this uncommon case. Maybe O_PATH could be used for this? It gets you a file descriptor on a particular filesystem, so syncfs() is defined, but it can't report a writeback error. So if you open something O_PATH, you can use the file's f_wb_err for the mapping's error cursor. </blockquote> That might work. It'd be a syscall behavioral change so we'd need to document that well. It's probably innocuous though -- I doubt we have a lot of callers in the field opening files with O_PATH and calling syncfs on them. <hr /> <pre><code>From: Jeff Layton <jlayton@...hat.com> Date: Fri, 13 Apr 2018 09:18:56 -0400 </code></pre> On Fri, 2018-04-13 at 08:44 +1000, Dave Chinner wrote: <blockquote> On Thu, Apr 12, 2018 at 11:08:50AM -0400, Jeff Layton wrote: <blockquote> On Thu, 2018-04-12 at 22:01 +1000, Dave Chinner wrote: <blockquote> On Thu, Apr 12, 2018 at 07:09:14AM -0400, Jeff Layton wrote: <blockquote> When there is a writeback error, what should be done with the dirty page(s)? Right now, we usually just mark them clean and carry on. Is that the right thing to do? </blockquote> There isn't a right thing. Whatever we do will be wrong for someone. <blockquote> One possibility would be to invalidate the range that failed to be written (or the whole file) and force the pages to be faulted in again on the next access. It could be surprising for some applications to not see the results of their writes on a subsequent read after such an event. </blockquote> Not to mention a POSIX IO ordering violation. Seeing stale data after a "successful" write is simply not allowed. </blockquote> I'm not so sure here, given that we're dealing with an error condition. Are we really obligated not to allow any changes to pages that we can't write back? </blockquote> Posix says this about write(): After a write() to a regular file has successfully returned: <pre><code> Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified. </code></pre> IOWs, even if there is a later error, we told the user the write was successful, and so according to POSIX we are not allowed to wind back the data to what it was before the write() occurred. <blockquote> Given that the pages are clean after these failures, we aren't doing this even today: Suppose we're unable to do writes but can do reads vs. the backing store. After a wb failure, the page has the dirty bit cleared. If it gets kicked out of the cache before the read occurs, it'll have to be faulted back in. Poof -- your write just disappeared. </blockquote> Yes - I was pointing out what the specification we supposedly conform to says about this behaviour, not that our current behaviour conforms to the spec. Indeed, have you even noticed xfs_aops_discard_page() and it's surrounding context on page writeback submission errors? To save you looking, XFS will trash the page contents completely on a filesystem level ->writepage error. It doesn't mark them "clean", doesn't attempt to redirty and rewrite them - it clears the uptodate state and may invalidate it completely. IOWs, the data written "sucessfully" to the cached page is now gone. It will be re-read from disk on the next read() call, in direct violation of the above POSIX requirements. This is my point: we've done that in XFS knowing that we violate POSIX specifications in this specific corner case - it's the lesser of many evils we have to chose between. Hence if we chose to encode that behaviour as the general writeback IO error handling algorithm, then it needs to done with the knowledge it is a specification violation. Not to mention be documented as a POSIX violation in the various relevant man pages and that this is how all filesystems will behave on async writeback error..... </blockquote> Got it, thanks. Yes, I think we ought to probably do the same thing globally. It's nice to know that xfs has already been doing this. That makes me feel better about making this behavior the gold standard for Linux filesystems. So to summarize, at this point in the discussion, I think we want to consider doing the following: <ul> <li>better reporting from syncfs (report an error when even one inode failed to be written back since last syncfs call). We'll probably implement this via a per-sb errseq_t in some fashion, though there are some implementation issues to work out.</li> <li>invalidate or clear uptodate flag on pages that experience writeback errors, across filesystems. Encourage this as standard behavior for filesystems and maybe add helpers to make it easier to do this.</li> </ul> Did I miss anything? Would that be enough to help the Pg usecase? I don't see us ever being able to reasonably support its current expectation that writeback errors will be seen on fd's that were opened after the error occurred. That's a really thorny problem from an object lifetime perspective. <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Fri, 13 Apr 2018 06:25:35 -0700 </code></pre> Hi, On 2018-04-13 09:18:56 -0400, Jeff Layton wrote: <blockquote> Yes, I think we ought to probably do the same thing globally. It's nice to know that xfs has already been doing this. That makes me feel better about making this behavior the gold standard for Linux filesystems. So to summarize, at this point in the discussion, I think we want to consider doing the following: <ul> <li>better reporting from syncfs (report an error when even one inode failed to be written back since last syncfs call). We'll probably implement this via a per-sb errseq_t in some fashion, though there are some implementation issues to work out.</li> <li>invalidate or clear uptodate flag on pages that experience writeback errors, across filesystems. Encourage this as standard behavior for filesystems and maybe add helpers to make it easier to do this.</li> </ul> Did I miss anything? Would that be enough to help the Pg usecase? I don't see us ever being able to reasonably support its current expectation that writeback errors will be seen on fd's that were opened after the error occurred. That's a really thorny problem from an object lifetime perspective. </blockquote> It's not perfect, but I think the amount of hacky OS specific code should be acceptable. And it does allow for a wrapper tool that can be used around backup restores etc to syncfs all the necessary filesystems. Let me mull with others for a bit. <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Fri, 13 Apr 2018 07:02:32 -0700 </code></pre> On Fri, Apr 13, 2018 at 09:18:56AM -0400, Jeff Layton wrote: <blockquote> On Fri, 2018-04-13 at 08:44 +1000, Dave Chinner wrote: <blockquote> To save you looking, XFS will trash the page contents completely on a filesystem level ->writepage error. It doesn't mark them "clean", doesn't attempt to redirty and rewrite them - it clears the uptodate state and may invalidate it completely. IOWs, the data written "sucessfully" to the cached page is now gone. It will be re-read from disk on the next read() call, in direct violation of the above POSIX requirements. This is my point: we've done that in XFS knowing that we violate POSIX specifications in this specific corner case - it's the lesser of many evils we have to chose between. Hence if we chose to encode that behaviour as the general writeback IO error handling algorithm, then it needs to done with the knowledge it is a specification violation. Not to mention be documented as a POSIX violation in the various relevant man pages and that this is how all filesystems will behave on async writeback error..... </blockquote> Got it, thanks. Yes, I think we ought to probably do the same thing globally. It's nice to know that xfs has already been doing this. That makes me feel better about making this behavior the gold standard for Linux filesystems. So to summarize, at this point in the discussion, I think we want to consider doing the following: <ul> <li>better reporting from syncfs (report an error when even one inode failed to be written back since last syncfs call). We'll probably implement this via a per-sb errseq_t in some fashion, though there are some implementation issues to work out.</li> <li>invalidate or clear uptodate flag on pages that experience writebackerrors, across filesystems. Encourage this as standard behavior for filesystems and maybe add helpers to make it easier to do this.</li> </ul> Did I miss anything? Would that be enough to help the Pg usecase? I don't see us ever being able to reasonably support its current expectation that writeback errors will be seen on fd's that were opened after the error occurred. That's a really thorny problem from an object lifetime perspective. </blockquote> I think we can do better than XFS is currently doing (but I agree that we should have the same behaviour across all Linux filesystems!) <ol> <li>If we get an error while wbc->for_background is true, we should not clear uptodate on the page, rather SetPageError and SetPageDirty.</li> <li>Background writebacks should skip pages which are PageError.</li> <li>for_sync writebacks should attempt one last write. Maybe it'll succeed this time. If it does, just ClearPageError. If not, we have somebody to report this writeback error to, and ClearPageUptodate.</li> </ol> I think kupdate writes are the same as for_background writes. for_reclaim is tougher. I don't want to see us getting into OOM because we're hanging onto stale data, but we don't necessarily have an open fd to report the error on. I think I'm leaning towards behaving the same for for_reclaim as for_sync, but this is probably a subject on which reasonable people can disagree. And this logic all needs to be on one place, although invoked from each filesystem. <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Fri, 13 Apr 2018 07:48:07 -0700 </code></pre> On Tue, Apr 10, 2018 at 03:07:26PM -0700, Andres Freund wrote: <blockquote> I don't think that's the full issue. We can deal with the fact that an fsync failure is edge-triggered if there's a guarantee that every process doing so would get it. The fact that one needs to have an FD open from before any failing writes occurred to get a failure, THAT'S the big issue. Beyond postgres, it's a pretty common approach to do work on a lot of files without fsyncing, then iterate over the directory fsync everything, and then assume you're safe. But unless I severaly misunderstand something that'd only be safe if you kept an FD for every file open, which isn't realistic for pretty obvious reasons. </blockquote> While accepting that under memory pressure we can still evict the error indicators, we can do a better job than we do today. The current design of error reporting says that all errors which occurred before you opened the file descriptor are of no interest to you. I don't think that's necessarily true, and it's actually a change of behaviour from before the errseq work. Consider Stupid Task A which calls open(), write(), close(), and Smart Task B which calls open(), write(), fsync(), close() operating on the same file. If A goes entirely before B and encounters an error, before errseq_t, B would see the error from A's write. If A and B overlap, even a little bit, then B still gets to see A's error today. But if writeback happens for A's write before B opens the file then B will never see the error. B doesn't want to see historical errors that a previous invocation of B has already handled, but we know whether anyone has seen the error or not. So here's a patch which restores the historical behaviour of seeing old unhandled errors on a fresh file descriptor: Signed-off-by: Matthew Wilcox <a href="mailto:mawilcox@...rosoft.com">mawilcox@...rosoft.com</a> <pre><code>diff --git a/lib/errseq.c b/lib/errseq.c index df782418b333..093f1fba4ee0 100644 --- a/lib/errseq.c +++ b/lib/errseq.c @@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set); errseq_t errseq_sample(errseq_t *eseq) { errseq_t old = READ_ONCE(*eseq); - errseq_t new = old; - /* - * For the common case of no errors ever having been set, we can skip - * marking the SEEN bit. Once an error has been set, the value will - * never go back to zero. - */ - if (old != 0) { - new |= ERRSEQ_SEEN; - if (old != new) - cmpxchg(eseq, old, new); - } - return new; + /* If nobody has seen this error yet, then we can be the first. */ + if (!(old & ERRSEQ_SEEN)) + old = 0; + return old; } EXPORT_SYMBOL(errseq_sample); </code></pre> <hr /> <pre><code>From: Dave Chinner <david@...morbit.com> Date: Sat, 14 Apr 2018 11:47:52 +1000 </code></pre> On Fri, Apr 13, 2018 at 07:02:32AM -0700, Matthew Wilcox wrote: <blockquote> On Fri, Apr 13, 2018 at 09:18:56AM -0400, Jeff Layton wrote: <blockquote> On Fri, 2018-04-13 at 08:44 +1000, Dave Chinner wrote: <blockquote> To save you looking, XFS will trash the page contents completely on a filesystem level ->writepage error. It doesn't mark them "clean", doesn't attempt to redirty and rewrite them - it clears the uptodate state and may invalidate it completely. IOWs, the data written "sucessfully" to the cached page is now gone. It will be re-read from disk on the next read() call, in direct violation of the above POSIX requirements. This is my point: we've done that in XFS knowing that we violate POSIX specifications in this specific corner case - it's the lesser of many evils we have to chose between. Hence if we chose to encode that behaviour as the general writeback IO error handling algorithm, then it needs to done with the knowledge it is a specification violation. Not to mention be documented as a POSIX violation in the various relevant man pages and that this is how all filesystems will behave on async writeback error..... </blockquote> Got it, thanks. Yes, I think we ought to probably do the same thing globally. It's nice to know that xfs has already been doing this. That makes me feel better about making this behavior the gold standard for Linux filesystems. So to summarize, at this point in the discussion, I think we want to consider doing the following: <ul> <li>better reporting from syncfs (report an error when even one inode failed to be written back since last syncfs call). We'll probably implement this via a per-sb errseq_t in some fashion, though there are some implementation issues to work out.</li> <li>invalidate or clear uptodate flag on pages that experience writeback errors, across filesystems. Encourage this as standard behavior for filesystems and maybe add helpers to make it easier to do this.</li> </ul> Did I miss anything? Would that be enough to help the Pg usecase? I don't see us ever being able to reasonably support its current expectation that writeback errors will be seen on fd's that were opened after the error occurred. That's a really thorny problem from an object lifetime perspective. </blockquote> I think we can do better than XFS is currently doing (but I agree that we should have the same behaviour across all Linux filesystems!) <ol> <li>If we get an error while wbc->for_background is true, we should not clear uptodate on the page, rather SetPageError and SetPageDirty.</li> </ol> </blockquote> So you're saying we should treat it as a transient error rather than a permanent error. <blockquote> <ol> <li>Background writebacks should skip pages which are PageError.</li> </ol> </blockquote> That seems decidedly dodgy in the case where there is a transient error - it requires a user to specifically run sync to get the data to disk after the transient error has occurred. Say they don't notice the problem because it's fleeting and doesn't cause any obvious problems? e.g. XFS gets to enospc, runs out of reserve pool blocks so can't allocate space to write back the page, then space is freed up a few seconds later and so the next write will work just fine. This is a recipe for "I lost data that I wrote /days/ before the system crashed" bug reports. <blockquote> <ol> <li>for_sync writebacks should attempt one last write. Maybe it'll succeed this time. If it does, just ClearPageError. If not, we have somebody to report this writeback error to, and ClearPageUptodate.</li> </ol> </blockquote> Which may well be unmount. Are we really going to wait until unmount to report fatal errors? We used to do this with XFS metadata. We'd just keep trying to write metadata and keep the filesystem running (because it's consistent in memory and it might be a transient error) rather than shutting down the filesystem after a couple of retries. the result was that users wouldn't notice there were problems until unmount, and the most common sympton of that was "why is system shutdown hanging?". We now don't hang at unmount by default: <pre><code>$ cat /sys/fs/xfs/dm-0/error/fail_at_unmount 1 $ </code></pre> And we treat different errors according to their seriousness. EIO and device ENOSPC we default to retry forever because they are often transient, but for ENODEV we fail and shutdown immediately (someone pulled the USB stick out). metadata failure behaviour is configured via changing fields in /sys/fs/xfs//error/metadata//... We've planned to extend this failure configuration to data IO, too, but never quite got around to it yet. this is a clear example of "one size doesn't fit all" and I think we'll end up doing the same sort of error behaviour configuration in XFS for these cases. (i.e. /sys/fs/xfs//error/writeback//....) <blockquote> And this logic all needs to be on one place, although invoked from each filesystem. </blockquote> Perhaps so, but as there's no "one-size-fits-all" behaviour, I really want to extend the XFS error config infrastructure to control what the filesystem does on error here. <hr /> <pre><code>From: Andres Freund <andres@...razel.de> Date: Fri, 13 Apr 2018 19:04:33 -0700 </code></pre> Hi, On 2018-04-14 11:47:52 +1000, Dave Chinner wrote: <blockquote> And we treat different errors according to their seriousness. EIO and device ENOSPC we default to retry forever because they are often transient, but for ENODEV we fail and shutdown immediately (someone pulled the USB stick out). metadata failure behaviour is configured via changing fields in /sys/fs/xfs//error/metadata//... We've planned to extend this failure configuration to data IO, too, but never quite got around to it yet. this is a clear example of "one size doesn't fit all" and I think we'll end up doing the same sort of error behaviour configuration in XFS for these cases. (i.e. /sys/fs/xfs//error/writeback//....) </blockquote> Have you considered adding an ext/fat/jfs errors=remount-ro/panic/continue style mount parameter? <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Fri, 13 Apr 2018 19:38:14 -0700 </code></pre> On Sat, Apr 14, 2018 at 11:47:52AM +1000, Dave Chinner wrote: <blockquote> On Fri, Apr 13, 2018 at 07:02:32AM -0700, Matthew Wilcox wrote: <blockquote> <ol> <li>If we get an error while wbc->for_background is true, we should not clear uptodate on the page, rather SetPageError and SetPageDirty.</li> </ol> </blockquote> So you're saying we should treat it as a transient error rather than a permanent error. </blockquote> Yes, I'm proposing leaving the data in memory in case the user wants to try writing it somewhere else. <blockquote> <blockquote> <ol> <li>Background writebacks should skip pages which are PageError.</li> </ol> </blockquote> That seems decidedly dodgy in the case where there is a transient error - it requires a user to specifically run sync to get the data to disk after the transient error has occurred. Say they don't notice the problem because it's fleeting and doesn't cause any obvious problems? </blockquote> That's fair. What I want to avoid is triggering the same error every 30 seconds (or whatever the periodic writeback threshold is set to). <blockquote> e.g. XFS gets to enospc, runs out of reserve pool blocks so can't allocate space to write back the page, then space is freed up a few seconds later and so the next write will work just fine. This is a recipe for "I lost data that I wrote /days/ before the system crashed" bug reports. </blockquote> So ... exponential backoff on retries? <blockquote> <blockquote> <ol> <li>for_sync writebacks should attempt one last write. Maybe it'll succeed this time. If it does, just ClearPageError. If not, we have somebody to report this writeback error to, and ClearPageUptodate.</li> </ol> </blockquote> Which may well be unmount. Are we really going to wait until unmount to report fatal errors? </blockquote> Goodness, no. The errors would be immediately reportable using the wb_err mechanism, as soon as the first error was encountered. <hr /> <hr /> <pre><code>From: bfields@...ldses.org (J. Bruce Fields) Date: Wed, 18 Apr 2018 12:52:19 -0400 </code></pre> <blockquote> Theodore Y. Ts'o - 10.04.18, 20:43: <blockquote> First of all, what storage devices will do when they hit an exception condition is quite non-deterministic. For example, the vast majority of SSD's are not power fail certified. What this means is that if they suffer a power drop while they are doing a GC, it is quite possible for data written six months ago to be lost as a result. The LBA could potentialy be far, far away from any LBA's that were recently written, and there could have been multiple CACHE FLUSH operations in the since the LBA in question was last written six months ago. No matter; for a consumer-grade SSD, it's possible for that LBA to be trashed after an unexpected power drop. </blockquote> </blockquote> Pointers to documentation or papers or anything? The only google results I can find for "power fail certified" are your posts. I've always been confused by SSD power-loss protection, as nobody seems completely clear whether it's a safety or a performance feature. <hr /> <pre><code>From: bfields@...ldses.org (J. Bruce Fields) Date: Wed, 18 Apr 2018 14:09:03 -0400 </code></pre> On Wed, Apr 11, 2018 at 07:17:52PM -0700, Andres Freund wrote: <blockquote> Hi, On 2018-04-11 15:52:44 -0600, Andreas Dilger wrote: <blockquote> On Apr 10, 2018, at 4:07 PM, Andres Freund <a href="mailto:andres@...razel.de">andres@...razel.de</a> wrote: <blockquote> 2018-04-10 18:43:56 Ted wrote: <blockquote> So for better or for worse, there has not been as much investment in buffered I/O and data robustness in the face of exception handling of storage devices. </blockquote> That's a bit of a cop out. It's not just databases that care. Even more basic tools like SCM, package managers and editors care whether they can proper responses back from fsync that imply things actually were synced. </blockquote> Sure, but it is mostly PG that is doing (IMHO) crazy things like writing to thousands(?) of files, closing the file descriptors, then expecting fsync() on a newly-opened fd to return a historical error. </blockquote> It's not just postgres. dpkg (underlying apt, on debian derived distros) to take an example I just randomly guessed, does too: /* We want to guarantee the extracted files are on the disk, so that the * subsequent renames to the info database do not end up with old or zero * length files in case of a system crash. As neither dpkg-deb nor tar do * explicit fsync()s, we have to do them here. * XXX: This could be avoided by switching to an internal tar extractor. */ dir_sync_contents(cidir); (a bunch of other places too) Especially on ext3 but also on newer filesystems it's performancewise entirely infeasible to fsync() every single file individually - the performance becomes entirely attrocious if you do that. </blockquote> Is that still true if you're able to use some kind of parallelism? (async io, or fsync from multiple processes?) <hr /> <pre><code>From: Dave Chinner <david@...morbit.com> Date: Thu, 19 Apr 2018 09:59:50 +1000 </code></pre> On Fri, Apr 13, 2018 at 07:04:33PM -0700, Andres Freund wrote: <blockquote> Hi, On 2018-04-14 11:47:52 +1000, Dave Chinner wrote: <blockquote> And we treat different errors according to their seriousness. EIO and device ENOSPC we default to retry forever because they are often transient, but for ENODEV we fail and shutdown immediately (someone pulled the USB stick out). metadata failure behaviour is configured via changing fields in /sys/fs/xfs//error/metadata//... We've planned to extend this failure configuration to data IO, too, but never quite got around to it yet. this is a clear example of "one size doesn't fit all" and I think we'll end up doing the same sort of error behaviour configuration in XFS for these cases. (i.e. /sys/fs/xfs//error/writeback//....) </blockquote> Have you considered adding an ext/fat/jfs errors=remount-ro/panic/continue style mount parameter? </blockquote> That's for metadata writeback error behaviour, not data writeback IO errors. We are definitely not planning to add mount options to configure IO error behaviors. Mount options are a horrible way to configure filesystem behaviour and we've already got other, fine-grained configuration infrastructure for configuring IO error behaviour. Which, as I just pointed out, was designed to be be extended to data writeback and other operational error handling in the filesystem (e.g. dealing with ENOMEM in different ways). <hr /> <pre><code>From: Dave Chinner <david@...morbit.com> Date: Thu, 19 Apr 2018 10:13:43 +1000 </code></pre> On Fri, Apr 13, 2018 at 07:38:14PM -0700, Matthew Wilcox wrote: <blockquote> On Sat, Apr 14, 2018 at 11:47:52AM +1000, Dave Chinner wrote: <blockquote> On Fri, Apr 13, 2018 at 07:02:32AM -0700, Matthew Wilcox wrote: <blockquote> <ol> <li>If we get an error while wbc->for_background is true, we should not clear uptodate on the page, rather SetPageError and SetPageDirty.</li> </ol> </blockquote> So you're saying we should treat it as a transient error rather than a permanent error. </blockquote> Yes, I'm proposing leaving the data in memory in case the user wants to try writing it somewhere else. </blockquote> And if it's getting IO errors because of USB stick pull? What then? <blockquote> <blockquote> <blockquote> <ol> <li>Background writebacks should skip pages which are PageError.</li> </ol> </blockquote> That seems decidedly dodgy in the case where there is a transient error - it requires a user to specifically run sync to get the data to disk after the transient error has occurred. Say they don't notice the problem because it's fleeting and doesn't cause any obvious problems? </blockquote> That's fair. What I want to avoid is triggering the same error every 30 seconds (or whatever the periodic writeback threshold is set to). </blockquote> So if kernel ring buffer overflows and so users miss the first error report, they'll have no idea that the data writeback is still failing? <blockquote> <blockquote> e.g. XFS gets to enospc, runs out of reserve pool blocks so can't allocate space to write back the page, then space is freed up a few seconds later and so the next write will work just fine. This is a recipe for "I lost data that I wrote /days/ before the system crashed" bug reports. </blockquote> So ... exponential backoff on retries? </blockquote> Maybe, but I don't think that actually helps anything and adds yet more "when should we write this" complication to inode writeback.... <blockquote> <blockquote> <blockquote> <ol> <li>for_sync writebacks should attempt one last write. Maybe it'll succeed this time. If it does, just ClearPageError. If not, we have somebody to report this writeback error to, and ClearPageUptodate.</li> </ol> </blockquote> Which may well be unmount. Are we really going to wait until unmount to report fatal errors? </blockquote> Goodness, no. The errors would be immediately reportable using the wb_err mechanism, as soon as the first error was encountered. </blockquote> But if there are no open files when the error occurs, that error won't get reported to anyone. Which means the next time anyone accesses that inode from a user context could very well be unmount or a third party sync/syncfs().... <hr /> <pre><code>From: Eric Sandeen <esandeen@...hat.com> Date: Wed, 18 Apr 2018 19:23:46 -0500 </code></pre> On 4/18/18 6:59 PM, Dave Chinner wrote: <blockquote> On Fri, Apr 13, 2018 at 07:04:33PM -0700, Andres Freund wrote: <blockquote> Hi, On 2018-04-14 11:47:52 +1000, Dave Chinner wrote: <blockquote> And we treat different errors according to their seriousness. EIO and device ENOSPC we default to retry forever because they are often transient, but for ENODEV we fail and shutdown immediately (someone pulled the USB stick out). metadata failure behaviour is configured via changing fields in /sys/fs/xfs//error/metadata//... We've planned to extend this failure configuration to data IO, too, but never quite got around to it yet. this is a clear example of "one size doesn't fit all" and I think we'll end up doing the same sort of error behaviour configuration in XFS for these cases. (i.e. /sys/fs/xfs//error/writeback//....) </blockquote> Have you considered adding an ext/fat/jfs errors=remount-ro/panic/continue style mount parameter? </blockquote> That's for metadata writeback error behaviour, not data writeback IO errors. </blockquote> /me points casually at data_err=abort & data_err=ignore in ext4... <pre><code> data_err=ignore Just print an error message if an error occurs in a file data buffer in ordered mode. data_err=abort Abort the journal if an error occurs in a file data buffer in ordered mode. </code></pre> Just sayin' <blockquote> We are definitely not planning to add mount options to configure IO error behaviors. Mount options are a horrible way to configure filesystem behaviour and we've already got other, fine-grained configuration infrastructure for configuring IO error behaviour. Which, as I just pointed out, was designed to be be extended to data writeback and other operational error handling in the filesystem (e.g. dealing with ENOMEM in different ways). </blockquote> I don't disagree, but there are already mount-option knobs in ext4, FWIW. <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Wed, 18 Apr 2018 17:40:37 -0700 </code></pre> On Thu, Apr 19, 2018 at 10:13:43AM +1000, Dave Chinner wrote: <blockquote> On Fri, Apr 13, 2018 at 07:38:14PM -0700, Matthew Wilcox wrote: <blockquote> On Sat, Apr 14, 2018 at 11:47:52AM +1000, Dave Chinner wrote: <blockquote> On Fri, Apr 13, 2018 at 07:02:32AM -0700, Matthew Wilcox wrote: <blockquote> <ol> <li>If we get an error while wbc->for_background is true, we should not clear uptodate on the page, rather SetPageError and SetPageDirty.</li> </ol> </blockquote> So you're saying we should treat it as a transient error rather than a permanent error. </blockquote> Yes, I'm proposing leaving the data in memory in case the user wants to try writing it somewhere else. </blockquote> And if it's getting IO errors because of USB stick pull? What then? </blockquote> I've been thinking about this. Ideally we want to pass some kind of notification all the way up to the desktop and tell the user to plug the damn stick back in. Then have the USB stick become the same blockdev that it used to be, and complete the writeback. We are so far from being able to do that right now that it's not even funny. <blockquote> <blockquote> <blockquote> <blockquote> <ol> <li>Background writebacks should skip pages which are PageError.</li> </ol> </blockquote> That seems decidedly dodgy in the case where there is a transient error - it requires a user to specifically run sync to get the data to disk after the transient error has occurred. Say they don't notice the problem because it's fleeting and doesn't cause any obvious problems? </blockquote> That's fair. What I want to avoid is triggering the same error every 30 seconds (or whatever the periodic writeback threshold is set to). </blockquote> So if kernel ring buffer overflows and so users miss the first error report, they'll have no idea that the data writeback is still failing? </blockquote> I wasn't thinking about kernel ringbuffer based reporting; I was thinking about errseq_t based reporting, so the application can tell the fsync failed and maybe does something application-level to recover like send the transactions across to another node in the cluster (or whatever this hypothetical application is). <blockquote> <blockquote> <blockquote> <blockquote> <ol> <li>for_sync writebacks should attempt one last write. Maybe it'll succeed this time. If it does, just ClearPageError. If not, we have somebody to report this writeback error to, and ClearPageUptodate.</li> </ol> </blockquote> Which may well be unmount. Are we really going to wait until unmount to report fatal errors? </blockquote> Goodness, no. The errors would be immediately reportable using the wb_err mechanism, as soon as the first error was encountered. </blockquote> But if there are no open files when the error occurs, that error won't get reported to anyone. Which means the next time anyone accesses that inode from a user context could very well be unmount or a third party sync/syncfs().... </blockquote> Right. But then that's on the application. <hr /> <pre><code>From: "Theodore Y. Ts'o" <tytso@....edu> Date: Wed, 18 Apr 2018 21:08:19 -0400 </code></pre> On Wed, Apr 18, 2018 at 05:40:37PM -0700, Matthew Wilcox wrote: <blockquote> I've been thinking about this. Ideally we want to pass some kind of notification all the way up to the desktop and tell the user to plug the damn stick back in. Then have the USB stick become the same blockdev that it used to be, and complete the writeback. We are so far from being able to do that right now that it's not even funny.o </blockquote> Maybe we shouldn't be trying to do any of this in the kernel, or at least as little as possible in the kernel? Perhaps it would be better to do most of this as a device mapper hack; I suspect we'll need userspace help to igure out whether the user has plugged the same USB stick in, or a different USB stick, anyway. <hr /> <hr /> <pre><code>From: Christoph Hellwig <hch@...radead.org> Date: Thu, 19 Apr 2018 01:39:04 -0700 </code></pre> On Wed, Apr 18, 2018 at 12:52:19PM -0400, J. Bruce Fields wrote: <blockquote> <blockquote> Theodore Y. Ts'o - 10.04.18, 20:43: <blockquote> First of all, what storage devices will do when they hit an exception condition is quite non-deterministic. For example, the vast majority of SSD's are not power fail certified. What this means is that if they suffer a power drop while they are doing a GC, it is quite possible for data written six months ago to be lost as a result. The LBA could potentialy be far, far away from any LBA's that were recently written, and there could have been multiple CACHE FLUSH operations in the since the LBA in question was last written six months ago. No matter; for a consumer-grade SSD, it's possible for that LBA to be trashed after an unexpected power drop. </blockquote> </blockquote> Pointers to documentation or papers or anything? The only google results I can find for "power fail certified" are your posts. I've always been confused by SSD power-loss protection, as nobody seems completely clear whether it's a safety or a performance feature. </blockquote> Devices from reputable vendors should always be power fail safe, bugs notwithstanding. What power-loss protection in marketing slides usually means is that an SSD has a non-volatile write cache. That is once a write is ACKed data is persisted and no additional cache flush needs to be sent. This is a feature only available in expensive eterprise SSDs as the required capacitors are expensive. Cheaper consumer or boot driver SSDs have a volatile write cache, that is we need to do a separate cache flush to persist data (REQ_OP_FLUSH in Linux). But a reasonable implementation of those still won't corrupt previously written data, they will just lose the volatile write cache that hasn't been flushed. Occasional bugs, bad actors or other issues might still happen. <hr /> <pre><code>From: "J. Bruce Fields" <bfields@...ldses.org> Date: Thu, 19 Apr 2018 10:10:16 -0400 </code></pre> On Thu, Apr 19, 2018 at 01:39:04AM -0700, Christoph Hellwig wrote: <blockquote> On Wed, Apr 18, 2018 at 12:52:19PM -0400, J. Bruce Fields wrote: <blockquote> <blockquote> Theodore Y. Ts'o - 10.04.18, 20:43: <blockquote> First of all, what storage devices will do when they hit an exception condition is quite non-deterministic. For example, the vast majority of SSD's are not power fail certified. What this means is that if they suffer a power drop while they are doing a GC, it is quite possible for data written six months ago to be lost as a result. The LBA could potentialy be far, far away from any LBA's that were recently written, and there could have been multiple CACHE FLUSH operations in the since the LBA in question was last written six months ago. No matter; for a consumer-grade SSD, it's possible for that LBA to be trashed after an unexpected power drop. </blockquote> </blockquote> Pointers to documentation or papers or anything? The only google results I can find for "power fail certified" are your posts. I've always been confused by SSD power-loss protection, as nobody seems completely clear whether it's a safety or a performance feature. </blockquote> Devices from reputable vendors should always be power fail safe, bugs notwithstanding. What power-loss protection in marketing slides usually means is that an SSD has a non-volatile write cache. That is once a write is ACKed data is persisted and no additional cache flush needs to be sent. This is a feature only available in expensive eterprise SSDs as the required capacitors are expensive. Cheaper consumer or boot driver SSDs have a volatile write cache, that is we need to do a separate cache flush to persist data (REQ_OP_FLUSH in Linux). But a reasonable implementation of those still won't corrupt previously written data, they will just lose the volatile write cache that hasn't been flushed. Occasional bugs, bad actors or other issues might still happen. </blockquote> Thanks! That was my understanding too. But then the name is terrible. As is all the vendor documentation I can find: <blockquote> <a href="https://insights.samsung.com/2016/03/22/power-loss-protection-how-ssds-are-protecting-data-integrity-white-paper/">https://insights.samsung.com/2016/03/22/power-loss-protection-how-ssds-are-protecting-data-integrity-white-paper/</a> "Power loss protection is a critical aspect of ensuring data integrity, especially in servers or data centers." <a href="https://www.intel.com/content/.../ssd-320-series-power-loss-data-protection-brief.pdf">https://www.intel.com/content/.../ssd-320-series-power-loss-data-protection-brief.pdf</a> "Data safety features prepare for unexpected power-loss and protect system and user data." </blockquote> Why do they all neglect to mention that their consumer drives are also perfectly capable of well-defined behavior after power loss, just at the expense of flush performance? It's ridiculously confusing. <hr /> <pre><code>From: Matthew Wilcox <willy@...radead.org> Date: Thu, 19 Apr 2018 10:40:10 -0700 </code></pre> On Wed, Apr 18, 2018 at 09:08:19PM -0400, Theodore Y. Ts'o wrote: <blockquote> On Wed, Apr 18, 2018 at 05:40:37PM -0700, Matthew Wilcox wrote: <blockquote> I've been thinking about this. Ideally we want to pass some kind of notification all the way up to the desktop and tell the user to plug the damn stick back in. Then have the USB stick become the same blockdev that it used to be, and complete the writeback. We are so far from being able to do that right now that it's not even funny.o </blockquote> Maybe we shouldn't be trying to do any of this in the kernel, or at least as little as possible in the kernel? Perhaps it would be better to do most of this as a device mapper hack; I suspect we'll need userspace help to igure out whether the user has plugged the same USB stick in, or a different USB stick, anyway. </blockquote> The device mapper target (dm-removable?) was my first idea too, but I kept thinking through use cases and I think we end up wanting this functionality in the block layer. Let's try a story. Stephen the PFY goes into the data centre looking to hotswap a failed drive. Due to the eight pints of lager he had for lunch, he pulls out the root drive instead of the failed drive. The air raid siren warbles and he realises his mistake, shoving the drive back in. CYOA: Currently: All writes are lost, calamities ensue. The PFY is fired. With dm-removable: Nobody thought to set up dm-removable on the root drive. Calamities still ensue, but now it's the BOFH's fault instead of the PFY's fault. Built into the block layer: After a brief hiccup while we reattach the drive to its block_device, the writes resume and nobody loses their job. <hr /> <pre><code>From: "Theodore Y. Ts'o" <tytso@....edu> Date: Thu, 19 Apr 2018 19:27:15 -0400 </code></pre> On Thu, Apr 19, 2018 at 10:40:10AM -0700, Matthew Wilcox wrote: <blockquote> With dm-removable: Nobody thought to set up dm-removable on the root drive. Calamities still ensue, but now it's the BOFH's fault instead of the PFY's fault. Built into the block layer: After a brief hiccup while we reattach the drive to its block_device, the writes resume and nobody loses their job. </blockquote> What you're talking about is a deployment issue, though. Ultimately the distribution will set up dm-removable automatically if the user requests it, much like it sets up dm-crypt automatically for laptop users upon request. My concern is that not all removable devices have a globally unique id number available in hardware so the kernel can tell whether or not it's the same device that has been plugged in. There are hueristics you could use -- for example, you could look at the file system uuid plus the last fsck time. But they tend to be very file system specific, and not things we would want ot have in the kernel. <hr /> <pre><code>From: Dave Chinner <david@...morbit.com> Date: Fri, 20 Apr 2018 09:28:59 +1000 </code></pre> On Wed, Apr 18, 2018 at 05:40:37PM -0700, Matthew Wilcox wrote: <blockquote> On Thu, Apr 19, 2018 at 10:13:43AM +1000, Dave Chinner wrote: <blockquote> On Fri, Apr 13, 2018 at 07:38:14PM -0700, Matthew Wilcox wrote: <blockquote> On Sat, Apr 14, 2018 at 11:47:52AM +1000, Dave Chinner wrote: <blockquote> On Fri, Apr 13, 2018 at 07:02:32AM -0700, Matthew Wilcox wrote: <blockquote> <ol> <li>If we get an error while wbc->for_background is true, we should not clear uptodate on the page, rather SetPageError and SetPageDirty.</li> </ol> </blockquote> So you're saying we should treat it as a transient error rather than a permanent error. </blockquote> Yes, I'm proposing leaving the data in memory in case the user wants to try writing it somewhere else. </blockquote> And if it's getting IO errors because of USB stick pull? What then? </blockquote> I've been thinking about this. Ideally we want to pass some kind of notification all the way up to the desktop and tell the user to plug the damn stick back in. Then have the USB stick become the same blockdev that it used to be, and complete the writeback. We are so far from being able to do that right now that it's not even funny. </blockquote> nod But in the meantime, device unplug (should give ENODEV, not EIO) is a fatal error and we need to toss away the data. <blockquote> <blockquote> <blockquote> <blockquote> <blockquote> <ol> <li>Background writebacks should skip pages which are PageError.</li> </ol> </blockquote> That seems decidedly dodgy in the case where there is a transient error - it requires a user to specifically run sync to get the data to disk after the transient error has occurred. Say they don't notice the problem because it's fleeting and doesn't cause any obvious problems? </blockquote> That's fair. What I want to avoid is triggering the same error every 30 seconds (or whatever the periodic writeback threshold is set to). </blockquote> So if kernel ring buffer overflows and so users miss the first error report, they'll have no idea that the data writeback is still failing? </blockquote> I wasn't thinking about kernel ringbuffer based reporting; I was thinking about errseq_t based reporting, so the application can tell the fsync failed and maybe does something application-level to recover like send the transactions across to another node in the cluster (or whatever this hypothetical application is). </blockquote> But if it's still failing, then we should be still trying to report the error. i.e. if fsync fails and the page remains dirty, then the next attmept to write it is a new error and fsync should report that. IOWs, I think we should be returning errors at every occasion errors need to be reported if we have a persistent writeback failure... <blockquote> <blockquote> <blockquote> <blockquote> <blockquote> <ol> <li>for_sync writebacks should attempt one last write. Maybe it'll succeed this time. If it does, just ClearPageError. If not, we have somebody to report this writeback error to, and ClearPageUptodate.</li> </ol> </blockquote> Which may well be unmount. Are we really going to wait until unmount to report fatal errors? </blockquote> Goodness, no. The errors would be immediately reportable using the wb_err mechanism, as soon as the first error was encountered. </blockquote> But if there are no open files when the error occurs, that error won't get reported to anyone. Which means the next time anyone accesses that inode from a user context could very well be unmount or a third party sync/syncfs().... </blockquote> Right. But then that's on the application. </blockquote> Which we know don't do the right thing. Seems like a lot of hoops to jump through given it still won't work if the appliction isn't changed to support linux specific error handling requirements... <hr /> <pre><code>From: Jan Kara <jack@...e.cz> Date: Sat, 21 Apr 2018 18:59:54 +0200 </code></pre> On Fri 13-04-18 07:48:07, Matthew Wilcox wrote: <blockquote> On Tue, Apr 10, 2018 at 03:07:26PM -0700, Andres Freund wrote: <blockquote> I don't think that's the full issue. We can deal with the fact that an fsync failure is edge-triggered if there's a guarantee that every process doing so would get it. The fact that one needs to have an FD open from before any failing writes occurred to get a failure, THAT'S the big issue. Beyond postgres, it's a pretty common approach to do work on a lot of files without fsyncing, then iterate over the directory fsync everything, and then assume you're safe. But unless I severaly misunderstand something that'd only be safe if you kept an FD for every file open, which isn't realistic for pretty obvious reasons. </blockquote> While accepting that under memory pressure we can still evict the error indicators, we can do a better job than we do today. The current design of error reporting says that all errors which occurred before you opened the file descriptor are of no interest to you. I don't think that's necessarily true, and it's actually a change of behaviour from before the errseq work. Consider Stupid Task A which calls open(), write(), close(), and Smart Task B which calls open(), write(), fsync(), close() operating on the same file. If A goes entirely before B and encounters an error, before errseq_t, B would see the error from A's write. If A and B overlap, even a little bit, then B still gets to see A's error today. But if writeback happens for A's write before B opens the file then B will never see the error. B doesn't want to see historical errors that a previous invocation of B has already handled, but we know whether anyone has seen the error or not. So here's a patch which restores the historical behaviour of seeing old unhandled errors on a fresh file descriptor: Signed-off-by: Matthew Wilcox <a href="mailto:mawilcox@...rosoft.com">mawilcox@...rosoft.com</a> </blockquote> So I agree with going to the old semantics of reporting errors from before a file was open at least once to someone. As the PG case shows apps are indeed relying on the old behavior. As much as it is unreliable, it ends up doing the right thing for these apps in 99% of cases and we shouldn't break them (BTW IMO the changelog should contain a note that this fixes a regression of PostgreSQL, a reference to this thread and CC to stable). Anyway feel free to add: Reviewed-by: Jan Kara <a href="mailto:jack@...e.cz">jack@...e.cz</a> Oh, and to make myself clear I do think we need to find a better way of reporting IO errors. I consider this just an immediate band-aid to avoid userspace regressions. <blockquote> diff --git a/lib/errseq.c b/lib/errseq.c index df782418b333..093f1fba4ee0 100644 --- a/lib/errseq.c +++ b/lib/errseq.c @@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set); errseq_t errseq_sample(errseq_t *eseq) { errseq_t old = READ_ONCE(*eseq); - errseq_t new = old; <ul> <li>/*</li> <li>* For the common case of no errors ever having been set, we can skip</li> <li>* marking the SEEN bit. Once an error has been set, the value will</li> <li>* never go back to zero.</li> <li>*/</li> <li>if (old != 0) {</li> <li>new |= ERRSEQ_SEEN;</li> <li>if (old != new)</li> <li>cmpxchg(eseq, old, new);</li> <li>}</li> <li>return new;</li> <li>/* If nobody has seen this error yet, then we can be the first. */</li> <li>if (!(old & ERRSEQ_SEEN))</li> <li>old = 0;</li> <li>return old;</li> </ul> </blockquote> <hr /> <pre><code>From: Jan Kara <jack@...e.cz> Date: Sat, 21 Apr 2018 20:14:29 +0200 </code></pre> On Thu 12-04-18 07:09:14, Jeff Layton wrote: <blockquote> On Wed, 2018-04-11 at 20:02 -0700, Matthew Wilcox wrote: <blockquote> At the moment, when we open a file, we sample the current state of the writeback error and only report new errors. We could set it to zero instead, and report the most recent error as soon as anything happens which would report an error. That way err = close(open("file")); would report the most recent error. That's not going to be persistent across the data structure for that inode being removed from memory; we'd need filesystem support for persisting that. But maybe it's "good enough" to only support it for recent files. Jeff, what do you think? </blockquote> I hate it :). We could do that, but....yecchhhh. Reporting errors only in the case where the inode happened to stick around in the cache seems too unreliable for real-world usage, and might be problematic for some use cases. I'm also not sure it would really be helpful. </blockquote> So this is never going to be perfect but I think we could do good enough by: 1) Mark inodes that hit IO error. 2) If the inode gets evicted from memory we store the fact that we hit an error for this IO in a more space efficient data structure (sparse bitmap, radix tree, extent tree, whatever). 3) If the underlying device gets destroyed, we can just switch the whole SB to an error state and forget per inode info. 4) If there's too much of per-inode error info (probably per-fs configurable limit in terms of number of inodes), we would yell in the kernel log, switch the whole fs to the error state and forget per inode info. This way there won't be silent loss of IO errors. Memory usage would be reasonably limited. It could happen the whole fs would switch to error state "prematurely" but if that's a problem for the machine, admin could tune the limit for number of inodes to keep IO errors for... <blockquote> I think the crux of the matter here is not really about error reporting, per-se. </blockquote> I think this is related but a different question. <blockquote> I asked this at LSF last year, and got no real answer: When there is a writeback error, what should be done with the dirty page(s)? Right now, we usually just mark them clean and carry on. Is that the right thing to do? One possibility would be to invalidate the range that failed to be written (or the whole file) and force the pages to be faulted in again on the next access. It could be surprising for some applications to not see the results of their writes on a subsequent read after such an event. Maybe that's ok in the face of a writeback error though? IDK. </blockquote> I can see the admin wanting to rather kill the machine with OOM than having to deal with data loss due to IO errors (e.g. if he has HA server fail over set up). Or retry for some time before dropping the dirty data. Or do what we do now (possibly with invalidating pages as you say). As Dave said elsewhere there's not one strategy that's going to please everybody. So it might be beneficial to have this configurable like XFS has it for metadata. OTOH if I look at the problem from application developer POV, most apps will just declare game over at the face of IO errors (if they take care to check for them at all). And the sophisticated apps that will try some kind of error recovery have to be prepared that the data is just gone (as depending on what exactly the kernel does is rather fragile) so I'm not sure how much practical value the configurable behavior on writeback errors would bring. <hr />