Dynamodb: the basics


Error: The provided key element does not match the schema

Fix: for an AWS endpoint, when trying to do an UpdateItem action via api-gateway, you need to set the method to POST instead of PUT, since the AWS endpoint only takes POST calls


Solidity: the basics

Online IDE:

Particulars of remix:

  • javascript VM: simulates the network in your browser memory, so you can speed up the development. It doesn’t depend on testing networks being down or slow
  • ganache is another alternative: it runs a network locally, you can just pick a web3 provider in the deploy menu


  • variable initialization defaults
    • strings are empty, ints are zero, booleans are false, so there are no null pointer errors
    • public variables automatically generate a getter function
  • Example of throwing a validation exception
    • require(msg.sender == owner, "You are not the owner");
      • msg is a special object that encapsulates the address that calls the smart contract
      • in this case, we return the specific error, and the whole transaction rolls back
    • transactions are atomic: if an exception happens, they will revert. There is no catch mechanism, so they will cascade thru methods (like Address.send,, etc)
    • difference between assert vs require:
      • require when revert returns reminding gas
      • require is used to validate user’s input
      • require lets you return an error
      • require triggers: function calls that do not return, receive ether without payable type, transfer errors
      • you can also user revert(“your error message here”) instead of require, to throw explicitly
      • assert doesn’t let you customize the error message
      • assert consumes the gas
      • assert validates things that are unexpected, invariants (out of bounds array, division by zero, non initiated value of internal functions type, or assert(false)
  • Mappings are like hash maps, except they don’t have a length, if you need it, it needs to be stored separately
    • their key type can be pretty much any elementary value
    • their value type can be anything
  • Strucs allow you to create your own data type. But they can’t contain other structs internally
  • Arrays can have both: fixed or dynamic size. The length of arrays can actually result in higher gas consumption
  • Enums are available
  • Functions
    • View function: reading from state (“constant” functions)
    • Pure function: Not reading or modifying state
    • Constructor: only called once during deployment, it is either public or external
    • Fallback function: it is called when you send ether to a contract without specifying the function to call (kind of the default)
      • it can only be external
    • Function visibility:
      • public: can be called from outside the contract
      • private: can only be called from within the contract
      • external: can be called from other contracts or externally
      • internal: only from the contract itself, or derived contracts, can’t be invoked by transactions
  • Inheritance, Modifiers, Importing of files:
    • It is similar to currying functions in javascript, where you can composite a modifier function and include it on others, and it will just run automagically.
    • Usually used to do precondition checks, where you need to do the same ones in multiple functions
  • ABI: application binary interface
    • it contains all the functions / arguments / return values, everything needed to interact with an smart contract
    • the more instructions resulting there from the compilation of your programs, the more gas you will have to pay for transactions against that contract
    • Gas also depends on how congested the network is, not only on the number of complex operations stored in the ABI
    • In the network, the more Gas you pay, the faster your transaction gets processed in the queue of waiting transactions
  • Libraries: they are executed in the context of your smart contract. They can’t receive Ether,


AWS: SAM gotchas

If your lambda is just returning before expected, check the default Timeout in template.yaml, most likely you will need a bigger number there


sam init

In case you don’t know where to start, that will walk you thru the process and download a sample app for ya

sam build

(pack the latest changes, it will rerun anything new you put in requirements.txt as well

sam local invoke “YourFunctionName” -e events/yourevent.json

run your function locally, with your own event

sam deploy

put it out there

redshift Uncategorized

Redshift: unable to connect newly created instance

Problem: you just created a new instance, and even though you told it to be publicly available, you can’t connect to it using the provided endpoint….

Solution: you need to explicitly add your current ip address to the security group you are using. Their default security group is miss-leading: even though it says it will accept all traffic from everywhere, it doesn’t (sad panda). Once you add a new security group and attach it to the redshift instance, you will be fine fine fine.

Source of the solution: the infamous stackoverflow:


puppeteer: a node.js package to simulate browsers

It can be used for web scrapping as well.

Setting up one:

npm init --yes
npm install puppeteer

Create an index.js page, with the following:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('');
  await page.screenshot({ path: 'takingasnapshotofyourpagehere.png' });

  await browser.close();
express node.js

Express: the basics

On returning content

# automagically detects content and set the right headers:


# explicitly return json objects:


# you are more interested in sending http codes than actual return objects:



mongodb: the basics

# check existing collections:

show collections

# drop a collection:


Apache Spark

Apache Spark: the basics

RDD: Resilient Distributed Dataset

It’s encapsulation on a collection of data. Distributed in clusters automatically. RDDs are immutable.

You can apply transformations (return a new RDD with, for example, filtered data), and Actions (like First(), to return the first item in them

RDDs are resilient, they can lose nodes, and be able to recreate them automagically

Transformations in RDD are lazy loaded. For example, if we have lines that open a file, and then filter it, the opening of the file won’t happen right away. Spark reads the entire data set first, and determine to only save the filtered data, for example.


It is a connection to a computer cluster, used to build RDDs

Example of loading an RDD from external storage:

sc = SparkContext(“local”, “texfile”) # builds the context first
lines = sc.textFile(“in/uppercase.text”) # creates the RDD


They do not mutate the current RDD, they return a new one.

filter() # returns elements passing the filter function

map() # applies the map function to each element on the original RDD and return the results in a new one

RDD actions

collect: puts the entire RDD in the driver program, usually to persist to disk. Memory intensive, make sure it is use in filtered, small datasets

count / countByValue: count number of rows, or number of unique values

take: return a subset of the RDD

saveAsTextFile: outputs to a storage in text mode

reduce: apply a lambda function to all the elements, two at a time, until we get a single result in return

persist: it will keep a copy of a produced RDD in memory, available fast for all nodes. You can pass the prefer storage level you like (DISK_ONLY, MEMORY_ONLY, etc) unpersist removes from the cache.


Java: the basics

Use int, long (primitives), instead of their objects (Integer, Long)

primitives are the atomic, basic, data types, unless you know what you are doing, stick to those.

They (primitives) are passed by value. Long and Integer are the object form of the primitives, to be avoided unless you need to pass by reference, or pass and make the code infer the type.

sample of using inline filters: -> s.getId() == COMPARE_ID)

where s is a particular member of somelist, with getId() as a method, and we are just picking the one where id match COMPARE_ID in this case

Spring / autowire

  • when you see it on a class, it pretty much means “you are going to need one of these, and I am going to wire it for you”. Example:
public class SomeClass  {
private ChallengeManager challengeManager;
public void setChallengeManager(@Qualifier(SpringConstants.COMPONENT_CHALLENGE_MANAGER) ChallengeManager challengeManager) {
    this.challengeManager = challengeManager;

so when SomeClass gets spawned, it will automagically include the class marked by @Autowired

the @Qualifier(SOMECONSTANT) is to ensure it is the class you want to autowire

in complex systems, there may be more than one ChallengeManager, so that qualifier and constant will make sure we are auto wiring the right one

Throwing and catching

if an interface is marked as “throws”, there should be a try catch that deal with the specified throwable exception in there

but if you want to make the parent calls of the class / method to deal with it instead, you can just add “throws ExceptionName” in the signature

AWS Glue

AWS Glue: the basics

  1. Crawl your data source first
    1. to create a catalog
    2. and table definitions
  2. Add a job to process your crawled data

That’s all!