Ask HN: Pandas to Polars migration, from 200s timeouts to under 4s. Anyone else?
Posted by Franco-m 3 days ago
I built a small project that cleans CSV files for ML and data analysis. Using pandas in the backend, cleaning a 65 MB file took over 200 seconds and frequently timed out. Just parsing the upload took over 120 seconds. After migrating to Polars and adding in-memory DataFrame caching between the upload and cleaning endpoints, the same file now cleans in under 4 seconds. Smaller files feel nearly instant. Has anyone else made this migration in a production app? Curious about edge cases you hit, especially with type inference, null handling, and lazy vs eager evaluation.
Comments
Comment by ShawnCCS 2 days ago
Next time, U can try chDB (Clickhouse in-process version). It support 100% pandas compatible with super OLAP power based on Clickhouse.
Official link: https://clickhouse.com/chdb
Disclosure: I work for ClickHouse.